Loading Triple and Quad Files

This topic provides instructions for using the SPARQL LOAD statement to load data to AnzoGraph from files that are in Turtle, N-Triple, N-Quad, or TriG format.

Note For information about load file directory requirements and load architecture, see Load Requirements and Recommendations. For more information on the data types that AnzoGraph uses to store loaded or inserted data, see AnzoGraph Data Type Handling.

Supported Load File Types

AnzoGraph supports the following load file types:

  • Turtle (.ttl file type): Terse RDF Triple Language that writes an RDF graph in compact form.
  • N-Triple (.n3 and .nt file types): A subset of Turtle known as simple triples.
  • N-Quad (.nq and .quads file types): N-Triples with a blank node or graph designation.
  • TriG (.trig file type): An extension of Turtle that supports representing a complete RDF data set.
  • CSV (.csv file type): Comma-separated value format. For instructions on loading CSV files, see Loading Tabular Files.

You can GZIP any of the load file types and load the <filename>.<extension>.gz files into the database. In addition, AnzoGraph supports loading tarballs that contain the load files. For example, if you have a directory of gzipped TTL files, you can tar the directory and load the resulting .ttl.gz.tar file.

Important AnzoGraph supports a maximum URI length of 16K characters. There is also a limit of 64K on the number of unique URIs you can load into AnzoGraph. That is, the number of unique URIs, including graph URIs and predicate URIs, that you can load into AnzoGraph must be less than 64K. If you exceed this limit, the load operation exceeding the limit will fail and AnzoGraph returns the message "m_lowest_unused_index <= a_max_value()".

LOAD Syntax

Run the following statement to load data from Turtle, N-Triple, N-Quad, or TriG files.

LOAD [ WITH 'leader' | 'compute' | 'global' ] <URL> [ INTO GRAPH <graph_name> ]
Option Description
WITH 'leader' Include this optional clause when loading files that only the leader server can access. WITH 'leader' is the default value for the LOAD statement. When the WITH clause is omitted, the load proceeds as if WITH 'leader' was specified.

Note: The "leader" keyword is case-sensitive. Type the term using lower case letters.
WITH 'compute' Include this optional clause when all servers will load files from their local file systems. Use this option if you have arranged the load files so that each AnzoGraph server has a unique subset of files on its local file system.

Note: The "compute" keyword is case-sensitive. Type the term using lower case letters.
WITH 'global' Include this optional clause when all servers will load a subset of files from directories on a mounted file system. Include this option when every AnzoGraph server in the cluster has visibility to the entire data set. AnzoGraph automatically divides file selection among the servers.
Note: The "global" keyword is case-sensitive. Type the term using lower case letters.
<URL> Required clause to specify the URL for the server and directory or file that you want to load. When you specify a directory, AnzoGraph loads all valid files in that directory.

For example, this URL loads a single file from the local filesystem:

<file:/home/user/data/tickit.ttl>

This example URL loads a directory of .ttl.gz files on a mounted file system:

<dir:/global/nfs/vpc_nfs_server/data/tickit_all.ttl.gz>

This example URL loads a directory of files on an S3 bucket.

<s3://mybucket/load-files/tickit_all.ttl>

Important: When specifying file: or dir: to load files from a file system, the file system must be accessible from AnzoGraph. For example, in a Docker environment, the file or directory must be stored on the AnzoGraph container file system. For instructions on copying files or directories from a local file system to the container's file system, see How do I copy load files from the host to the AnzoGraph file system in Docker?

INTO GRAPH <graph_name> When loading files such as Turtle or N-Triple files without graph specifications, include this optional clause to specify the graph to load data into. If the graph does not exist, the system automatically creates it and then loads the data into it. If you do not specify a graph, AnzoGraph loads data into the default graph.

You can also include the INTO GRAPH option when loading N-Quad files. If the N-Quad files contain a mixture of quads and triples, AnzoGraph loads the triples into the specified graph. Quads are still loaded according to their graph specification. If you omit this option for N-Quad files, any triples without graph specifications are loaded into the default graph.

Examples

The following example statement loads data from files in a directory on Amazon S3. The data is loaded into a graph named tickit:

LOAD WITH 'global' <s3://mybucket/load-files/tickit_all.ttl> INTO GRAPH <tickit>

This command loads data from gzipped turtle files in a directory on a mounted file system. All of the servers in the cluster have access to the filesystem. The data is loaded into a graph named sales:

LOAD WITH 'global' <dir:/global/nfs/vpc_nfs_server/data/sales_data.ttl.gz> INTO GRAPH <sales>

This example statement loads data from TRIG files in the employees.trig directory on a data server. The employees.trig directory contains an ls.dir text file that lists the filenames for the files to load.

LOAD WITH 'global' <https://data.cambridgesemantics.com/loads/employees.trig>
Related Topics