GDI Property Reference

This topic describes the Graph Data Interface (GDI) properties that are available to use in queries. The first section describes the options that are available regardless of data source type, and the second section describes the source-specific options.

Universal Properties
Source-Specific Properties

Universal Properties

The table below lists the properties that are valid in queries against all data source types.

Option	Data Type	Description
batching	boolean or int	This property can be used to disable batching, or it can be used to change the default the batch size. By default, batching is set to 5000 (`s:batching 5000`). To disable batching, you can include `s:batching false` in the query. Typically users do not change the batching size. However, it can be useful to control the batch size when performing updates. To configure the size, include `s:batching int` in the query. For example, `s:batching 3000`.
concurrency	int or RDF list	This property can be included to configure the maximum level of concurrency for the query. The value can be an integer, such as `s:concurrency 8`. If the value is an integer, it configures a maximum limit on the number of slices that can execute the query. For finer-grained control over the number of nodes and slices to use, concurrency can also be included as an object with `limit`, `nodes`, and/or `executorsPerNode` properties. For example, the following object configures a concurrency model that allows a maximum of 24 executors distributed across 4 nodes with 8 executors per node: s:concurrency [ s:limit 24 ; s:nodes 4 ; s:executorsPerNode 8 ; ] ;
count	variable	If you want to turn the query into a COUNT query, you can include this property with a `?variable` to perform a count. For example, `s:count ?count`.
errors	boolean	Controls whether the GDI ignores errors (such as query or file errors) or stops processing the query when an error is encountered. This property is set to `true` by default (`s:errors true`). Processing stops when an error is encountered. To ignore errors, you can include `s:errors false`.
formats	RDF list	To give users control over the data types that are used when coercing strings to other types, this property can be included in GDI queries to define the desired types. In addition, it can be used to describe the formats of date and time values in the source to ensure that they are recognized and parsed to the appropriate date, time, and/or dateTime values. For details about the `formats` property, see Data Type Formatting Options.
key	string	This property can be used to define the primary key column for the source file or table. This column is leveraged in a resource template for the instances that are created from the source. For example, `s:key ("EMPLOYEE_ID")`. For more information about `key`, see Data Linking Options.
limit	int	You can include this property to limit the number of results that are returned. `s:limit` maps to the SPARQL LIMIT clause.
locale	string	This property can be used to specify the locale to use when parsing locale-dependent data such as numbers, dates, and times.
model	string	This property defines the class (or table) name for the type of data that is generated from the specified data source. For example, `s:model "employees"`. Model is optional when querying a single source. If your query targets multiple sources, however, and you want to define resource templates (primary keys) and object properties (foreign keys), you must specify the model value for each source.
normalize	boolean and/or RDF list	To give users control over the labels and URIs that are generated, the GDI offers several options for normalizing the model and/or the fields that are created from the specified data source(s). For details about the `normalize` property, see Normalization Options.
offset	int	This property can be used to offset the data that is returned by a number of rows.
paging	RDF list	This property can be used to configure paging so that the GDI can access large amounts of data across a number of smaller requests. For details about the `paging` property, see Pagination Options.
password	string	This property lists the password for the given username.
rate	int or string	This property can be included to control the frequency with which a request is sent to the source. The limit applies to the number of requests a single slice can make. If you specify an integer for the rate, then the value is treated as the maximum number of requests to issue per minute. If you specify a string, you have more flexibility in configuring the rate. The sample values below show the types of values that are supported: s:rate "90/minute" ; s:rate "90 per minute" ; s:rate "200000 every week" ; s:rate "10000 every 6 hours" ; To enforce the rate limit, the GDI introduces a sleep between requests that is equal to the rate delay. The more executing slices, the longer the rate delay needs to be to enforce the limit in aggregate. Given the example of `s:rate "90/minute"`, the GDI would optimize the concurrency and only use 1 slice for execution with a rate delay of 666ms between requests. If `s:rate "240/minute"`, the GDI would use 3 executors with a rate delay of 750ms between requests.
reference	RDF list	This property can be used to specify a foreign key column. The reference property is an RDF list that includes the `model` property to list the target table and a `using` property that defines the foreign key column. For more information about `reference`, see Data Linking Options.
sampling	int	This property can be used to configure the number of records in the source to examine for data type inferencing.
selector	string or RDF list	This property can be used as a binding component to identify the path to the source objects. For example, `s:selector "Sales.SalesOrderHeader"` targets the SalesOrderHeader table in the Sales schema. For more information about binding components and the selector property, see Using Binding Trees and Selector Paths.
timeout	int	This property can be used to specify the timeout (in milliseconds) to use for requests against the source. For example, `s:timeout 5000` configures a 5 second timeout.
url	string	This property specifies the URL for the data source, such as the database URL, Elasticsearch URL, or HTTP endpoint URL. For file-based sources, the `url` property specifies the file system location of the source file or directory of files. When specifying a directory (such as `s:url "/opt/shared-files/loads/"`), the GDI loads all of the file formats it recognizes. To specify a directory but limit the number or type of files that are read, you can include the pattern and/or maxDepth properties. For security, it is a best practice to reference connection information (such as the url, username, and password) from a Query Context so that the sensitive details are abstracted from any requests. In addition, using a Query Context makes connection details reusable across queries. See Using Query Contexts in Queries for more information. For example, the triple patterns below reference keys from a Query Context: ?data a s:DbSource ; s:url "{{@db.eca4bfa83481f3638b93ab5fdf93ff9a.url}}" ; s:username "{{@db.eca4bfa83481f3638b93ab5fdf93ff9a.user}}" ; s:password "{{@db.eca4bfa83481f3638b93ab5fdf93ff9a.password}}" ;
username	string	If authentication is required to access the source, include this property to specify the user name.

Source-Specific Properties

The options described in this section are grouped by data source type. For more information about writing queries against different sources, see GDI Usage by Data Source Type.

DbSource Properties
FileSource Properties
HttpSource Properties
ElasticSource Properties

DbSource Properties

The table below lists the properties that are available for queries against database data sources. For more information about database sources, see Querying a Database Source.

Option	Data Type	Description
database	string	This property can be used to specify the database to target in the source if the database is not listed in the `s:url` or `s:selector` strings.
driver	string	This property can be included to specify the JDBC driver to use.
orderBy	string, variable, list	You can include this property to order the result set by a field name, a bound variable, or a list of names or bound variables.
maxConnections	int	This property can be used to set a limit on the maximum number of active connections to the source. For example, `s:maxConnections 16` sets the limit to 16 connections. The default value is `10`.
partitionBy	string, variable, list	The GDI attempts to partition queries automatically across the available cores (slices) in AnzoGraph. To determine how to partition the query, the GDI uses metadata from the source database. It looks for any column in an index, preferring the primary key column if it is interpolable. However, it only considers the first column in any index on the table. After determining the partition column, the GDI does a MIN/MAX on the column as well as a basic sizing query. To specify which column or columns the GDI should partition on, you can include the `partitionBy` property in the query. The property supports a list of source field names, bound variables, or the object `s:auto`, which forces the GDI to partition the data when the source does not define partitioning metadata.
property	RDF list	This property can be included to list any JDBC driver-specific connection properties. To incorporate `property`, use the following syntax: s:property [ s:name "custom_driver_property_name" ; s:value "custom_value" ]
query	string	If you want to access the source data by running an SQL query, you can include this property to specify the query string to run. The language does not have to be SQL if the source supports another language. However, some GDI features where the query is dynamically altered may not work with a non-SQL language. Including `{{?variable}}` substitutions is supported within `s:query` strings. If you include `s:query` without also specifying table or partitionBy, the GDI may not partition the query and query execution may be less performant than if the partition column was specified. When using `s:query`, specifying the table in `s:table` and the column to partition the table on in `s:partitionBy` is a good practice, especially when querying large tables.
schema	string	This property can be included to specify the target schema to query. If you include `s:schema "schema_name"` without specifying `s:table` (described below) or `s:query`, all tables in the schema are queried.
table	string	This property can be included to specify the target table or tables for the query.

FileSource Properties

The table below lists the properties that are available for queries against file-based data sources. For more information about file sources, see Querying File Sources.

Option	Data Type	Description
format	RDF list	You can include the `format` property to add parameters that describe the source files. See File Source Format Options for details about the supported parameters.
maxDepth	int	This property can be used to limit the directory traversal depth. By default, when `s:url` specifies a directory (and a `s:pattern` that limits that traversal depth is not specified), all subdirectories are processed. To process only the files in the top level directory, set maxDepth to 0 (`s:maxDepth 0`). To process the files in the top level directory plus the first-level subdirectories, set maxDepth to 1 (`s:maxDepth 1`), and so on.
mimetype	string	This property can be included to specify the MIME type of the data. If you are querying TSV files that do not have a .tsv file extension, include the `mimetype` property with a value of `text/tsv` (`s:mimetype "text/tsv"`).
options	RDF list	If additional connection information needs to be provided to access the file storage system, include the `options` property to list any storage-specific connection parameters. See File Storage Connection Options for information about the supported properties for each storage type.
pattern	string	This property can be used to specify a wildcard pattern for matching file names. For example, `s:pattern "common_prefix.csv"`. You can include one `s:pattern` property per FileSource. The GDI supports Unix file globbing syntax outside of parentheses. Within parentheses, full Java regular expression language is supported. For example, including `s:pattern "data//customer_.csv"` tells the GDI to load all files that match the pattern "customer_.csv" from any number of subdirectories under the `data` directory. Similarly `s:pattern "(\d+)/transaction_.csv"` tells the GDI to load all files that match the pattern "transaction_*.csv" in all subdirectories.

HttpSource Properties

The table below lists the properties that are available for queries against HTTP data sources. For more information about HTTP sources, see Querying an HTTP Source.

Option	Data Type	Description
authorization	RDF list	This property specifies the type of authorization to use and the values for authentication. The options are BearerToken, AWSSignature, or BasicAuth. s:authorization [ a s:BearerToken \| s:AWSSignature \| s:BasicAuth ] ;
AWSSignature	RDF list	For authorization to AWS service endpoints, specify this property and include the appropriate authentication properties from the list below: accessKey: Include this property to specify the AWS access key. region: Include this property to specify the AWS region. secretKey: Include this property to specify the AWS secret key. serviceName: Include this property to specify the AWS service name. sessionToken: Include this property to specify the AWS session token. s:authorization [ a s:AWSSignature ; s:accessKey "string" ; s:region "string" ; s:secretKey "string" ; s:serviceName "string" ; s:sessionToken "string" ; ] ;
BasicAuth	RDF list	Specify this property when basic authentication is used, and include the username and password properties. s:authorization [ a s:BasicAuth ; s:username "string" ; s:password "string" ; ] ;
BearerToken	string	Specify this property when a bearer token is used for authentication, and include the token property. s:authorization [ a s:BearerToken ; s:token "string" ] ;
content	string or RDF list	This property can be included to send content to the source in the body of the request. For example, `content` can be a SPARQL query, JSON arrays, or a list of key-value pairs. Content can also be configured with an inline object (blank node) that gets translated to JSON. For more information, see Mapping the Content Property to JSON.
contentType	string	Include this property to specify the content type of the body of the request. For example, `s:contentType "application/sparql-query"` or `s:contentType "application/json"`.
encoding	string	When targeting a file, you can include this property to specify the character encoding used by the file. The default value is `s:encoding "utf8"`.
form	RDF list	To send data to the HTTP endpoint, you can use this property to post the data. Form is a list of name-value pairs. When including `s:form`, you must also include `s:contentType "multipart/form-data"`. The GDI sends the form object as an `application/x-www-form-urlencoded` string that contains the specified parameters. The query snippet below shows an example that incorporates `s:form`. SERVICE <http://cambridgesemantics.com/services/DataToolkit> { VALUES (?reviews_text) { ("Horrible, terrible, will never use again.") ("Wonderful, magnificent, will recommend to everyone!") ("Simply OK. I might purchase this product again.") } ?data a s:HttpSource ; s:url "https://api.meaningcloud.com/sentiment-2.1" ; s:contentType "multipart/form-data" ; s:form [ s:name "key" ; s:value "9eab751142..." ], [ s:name "lang" ; s:value "auto" ], [ s:name "txt" ; s:value ?reviews_text ] ; ?confidence (); ?score_tag (); ?subjectivity () ; ?irony () ; ?agreement () . }
format	RDF list	If the data is file-based, you can include the `format` property to add parameters that describe the source. See File Source Format Options for details about the supported parameters.
header	RDF list	You can use this property to specify name-value pairs to include as headers in the request. For example: s:header [ s:name "Accept" ; s:value "application/json" ] If you are creating a view, you can include variables in the `s:header` list. When another query is run against a view with variables, that query can map the variables through the view by including predicates in the CONSTRUCT clause.
method	string	You can include this property to specify the HTTP method. For example, `s:method "GET"` or `s:method "POST"`.
mimetype	string	You can include this property to specify the MIME type of the source. For example, `s:mimetype "text/html"`.
orderBy	string, variable, list	You can include this property to order the result set by a field name, a bound variable, or a list of names or bound variables.
parameter	RDF list	You can include this property to list any URL parameters as name-value pairs. For example, the `s:parameter` property below adds `format` to return results in CSV format and the `named-graph-uri` parameter to target a specific layer in a graphmart. s:parameter [ s:name "format" ; s:value "csv" ] , [ s:name "named-graph-uri" ; s:value "http://cambridgesemantics.com/Layer/d541..." ] ; If you are creating a view, you can include variables in the `s:parameter` list. When another query is run against a view with variables, that query can map the variables through the view by including predicates in the CONSTRUCT clause.
partitionBy	string, variable, list	The GDI attempts to partition queries automatically across the available cores (slices) in AnzoGraph. To determine how to partition the query, the GDI uses metadata from the source. It looks for any column in an index, preferring the primary key column if it is interpolable. However, it only considers the first column in any index on the table. After determining the partition column, the GDI does a MIN/MAX on the column as well as a basic sizing query. To specify which column or columns the GDI should partition on, you can include the `partitionBy` property in the query. The property supports a list of source field names, bound variables, or the object `s:auto`, which forces the GDI to partition the data when the source does not define partitioning metadata.
proxy	string or RDF list	Include this property to specify proxy information if a proxy is used. The value can be a string, such as `s:proxy "host_url:port_number"`, or an RDF list that includes `host` and `port` properties, such as `s:proxy [ s:host "host_url" ; s:port port_number ]`.
trust	string	Include this property to set the level of trust for the source's SSL certificate. The value can be either `"system"` or `"all"`.

ElasticSource Properties

The table below lists the properties that are available for queries against Elasticsearch data sources. For more information about Elasticsearch sources, see Querying an Elasticsearch Source.

Option	Data Type	Description
aggregations	object	You can include this property to calculate aggregations over the specified bindings. For information about aggregations, see Aggregations in the Elasticsearch documentation.
config	string	To enable you to use explicit mappings, you can include this property to specify the URL to the index configuration file to employ. For example, `es:config "/opt/shared/elastic/mapping.json"`.
document	string	This property lists the document(s) to search.
field	string or variable	This property defines the field to operate on. The value can be a string or bound variable.
highlight	RDF list	You can include this property to define how results are highlighted. For information about the available properties, see Highlighting Elasticsearch Results.
html	boolean	This property controls whether to output HTML for highlighted results. Defaults to `true`.
index	string	This property can be included to specify the index to search.
minScore	float	This property defines the minimum score for matching documents. Documents with a lower score are not included in the search results.
query	string or RDF list	This property defines the query to execute. The value can be a string or a query object that maps to the Elasticsearch Query DSL. To generate the final query, the GDI combines `es:query` with any filters it can push to the Elasticsearch DSL. For more information about the `query` property and mapping Elasticsearch filters to SPARQL FILTER clauses, see Query DSL and Filter Mapping.
routing	string	This property can be included to route a document to a specific shard or to limit the search to a particular shard.
searchAfter		You can include this property to define the key values to start searching from.
size	int	This property maps to the `size` parameter in the Elasticsearch Search API and configures the batch size or maximum number of hits to return in a single call. Defaults to `10` and typically does not need to be changed.
source	boolean or RDF list	This property can be included to specify the source data to include in results. The value can be a boolean, list of fields, or a list of variable bindings. When `true`, all source data is returned. When `false`, no source data is returned.
url	string	The Elasticsearch endpoint URL.