RDF Load File Requirements
AnzoGraph DB supports loading RDF data from files on the AnzoGraph DB file system, on a remote web server or object store, or on a mounted file system. You can load data from a single file or multiple files in a directory. This topic provides details about the supported load file types, file storage systems, and load directory requirements.
Supported RDF File Types
AnzoGraph DB supports the following RDF load file types. See Introduction to the Graph Data Interface for information about supported non-RDF data sources.
- Turtle (.ttl file type): Terse RDF Triple Language that writes an RDF graph in compact form.
- N-Triple (.n3 and .nt file types): A subset of Turtle known as simple triples.
- N-Quad (.nq and .quads file types): N-Triples with a blank node or graph designation.
- TriG (.trig file type): An extension of Turtle that supports representing a complete RDF data set.
- JSON-LD (.jsonld file type): A method of encoding linked data using JSON. JSON-LD files are supported for loading via the IO services. JSON-LD is not supported by SPARQL LOAD queries.
You can compress any of the supported file types and load the compressed files into the database. The supported compression types are GZIP and ZST when using the IO services or GZIP when using SPARQL LOAD.
The AnzoGraph DB IO Load service supports decryption of load files using the Advanced Encryption Standard (AES). Cipher Block Chaining (CBC) and Galois/Counter Mode (GCM) with standard key sizes 128, 192, and 256 bits are supported.
Supported File Systems
When you have multiple files, AnzoGraph DB loads the files in parallel, using all available cores on all servers in the cluster. While you can load files stored on the leader node's local file system, for optimal performance, it is important to use a shared file system to ensure that all servers in the cluster have access to the files. In a Docker or Kubernetes container environment, the storage system should also be shared with the container file system.
The list below describes the supported file storage systems:
- Network File Systems (NFS) Version 4 or later
- Amazon Simple Cloud Storage Service (S3) object store
- Google Cloud Platform (GCP) object store
- Microsoft Azure Blob Storage
- Microsoft Azure WebDAV
- Web Server
To load files from the storage systems listed above, the AnzoGraph DB C++ extensions and dependencies must be installed. Docker, Kubernetes, and AWS Cloud Formation deployments include the C++ extensions and dependencies by default. For RHEL/CentOS installer installations, the extensions and dependencies are optional. Make sure you answer Yes when prompted about the C++ extensions (see Installing AnzoGraph DB). In addition, follow the instructions in Install Dependencies to Run the C++ Extensions to install the required dependencies.
Directory Name Requirements
In order to load a directory of files, the files must be organized in directories by file extension type, and the file type extension must be included in the name of the directory. For example, place TTL files in a <name>.ttl directory, place TRIG files in a <name>.trig directory, place NQ files in a <name>.nq directory, and so on.
Note on URI Limitations
AnzoGraph DB supports a maximum URI length of 16K characters. In addition, there is a limit of 64K on the number of unique predicate and graph URIs that can be stored in AnzoGraph DB. If the total number of unique predicate and graph URIs exceeds the 64K limit, the load operation that exceeds the limit will fail and AnzoGraph DB returns the message m_lowest_unused_index <= a_max_value()
.