Loading Local RDF Files with SPARQL LOAD

If you have Turtle, N-Triple, N-Quad, or TriG files on the local AnzoGraph DB file system or a mounted NFS, you have the option to load the data by running a native SPARQL LOAD query. For instructions on loading RDF files from a remote store, such as cloud object storage or a web server, see Loading Local or Remote RDF Files with the IO Load Service.

This topic lists the syntax to use for SPARQL LOAD queries and provides some examples to follow.

LOAD Syntax

Run the following query to load data from Turtle, N-Triple, N-Quad, or TriG files. The options that are links are described below.

LOAD [ SILENT ] [ WITH 'global' | 'leader' | 'compute' ] <URI> [...<URIn>] [ INTO GRAPH <graph_uri> ]
Option Description
SILENT Include this optional keyword if you want AnzoGraph DB to ignore "bad data" errors during the load. Data issues are problems such as dateTime values that are incorrectly formatted or strings that are tagged as double data types. The SILENT keyword does not silence syntax errors in the files. If a file is ill-formed, such as if it includes invalid characters in place of URIs, AnzoGraph DB cannot parse the data and the file must be corrected.

When SILENT is omitted, AnzoGraph DB aborts the load upon hitting a data or syntax error and reports the error to the client. When SILENT is included and AnzoGraph DB encounters an error with the data, it logs the error to a graph and proceeds with the load. By default, any errors are captured in the <load_errors> graph. After a load completes, you can query the graph to review errors. To customize the load error graph, you can change the load_errors_graph system setting. See Changing System Settings for instructions.

When SILENT is specified, the load will still be aborted if there are syntax errors in the files. AnzoGraph DB cannot parse the data if there are syntax errors. The file or files must be corrected and loaded again.

WITH The optional WITH clause can be used to specify which servers in the cluster have access to the load files. You can choose one of the following options:
  • global: Include WITH 'global' when all servers in the cluster will load a subset of the same files or directories on a mounted file system. Include this option when every AnzoGraph DB server in the cluster has visibility to the entire data set. AnzoGraph DB automatically divides file selection among the servers.
  • leader: Include WITH 'leader' when loading files that only the leader server can access. WITH 'leader' is the default value for the LOAD query. When the WITH clause is omitted, the load proceeds as if WITH 'leader' was specified.
  • compute: Include WITH 'compute' when all servers will load files from their local file systems. Use this option if you have arranged the files so that each AnzoGraph DB server has a unique subset of files on its local file system.

The leader, compute, and global keywords are case-sensitive. Type the terms using lower case letters.

URI Required clause that specifies the absolute path to the load file or files. To load a single file, the scheme of the URI should be file:. To load a directory of files, the scheme of the URI should be dir:. When loading a directory, make sure the directory name includes the same file type extension as the files in the directory, i.e., a directory of TTL files is named name.ttl, a directory of TriG files is named name.trig, and a directory of NQ files is named name.nq. When you specify a directory, AnzoGraph DB loads all valid files in that directory as well as any subdirectories. AnzoGraph DB does not load any hidden files that are named with a leading period, such as .file.ttl.

For example, the following URI loads a single file from a shared directory:

<file:/shared-files/data/tickit.ttl>

This example URI loads a directory of .ttl.gz files:

<dir:/global/nfs/vpc_nfs_server/data/tickit_all.ttl.gz>

And this example URI statement loads multiple directories of .ttl.gz files:

<dir:/global/nfs/data/tickit_all.ttl.gz> <dir:/global/nfs/data/movies.ttl.gz>

If you specify more than one URI to load from, each URI must target the same file type, such as .ttl, .trig, etc. Also each URI must specify the same scheme, file: or dir:.

INTO GRAPH <graph_uri> When loading files such as Turtle or N-Triple files without graph specifications, include this optional clause to specify the graph to load data into. If the graph does not exist, the system automatically creates it and then loads the data into it. If you do not specify a graph, AnzoGraph DB loads data into the default graph.

You can also include INTO GRAPH when loading N-Quad files. If the N-Quad files contain a mixture of quads and triples, AnzoGraph DB loads the triples into the specified graph. Quads are still loaded according to their graph specification. If you omit this option for N-Quad files, any triples without graph specifications are loaded into the default graph.

LOAD Examples

The following example query loads data from gzipped TTL files in a directory on a mounted file system. Since all servers in the cluster have access to the file system, WITH 'global' is specified. The data is loaded into a graph named http://anzograph.com/sales:

LOAD WITH 'global' <dir:/global/nfs/data/sales_data.ttl.gz> INTO GRAPH <http://anzograph.com/sales>

The example query below loads a shared directory of N-Quad files. Since the files include graph specifications, the INTO GRAPH clause is omitted:

LOAD WITH 'global' <dir:/global/nfs/data/employees.nq>

Related Topics