Copying Graphs to Files

This topic provides instructions for using the COPY command to copy currently loaded data from AnzoGraph to compressed or uncompressed files on disk. Typically, the only reason for explicitly copying data from AnzoGraph to a file is that you may have added to, removed, or updated data in a graph and want to store the currently loaded data as an archive copy that you can reload later. Or, you may want to create a backup to restore data to a previous state after upgrading or installing a new version of AnzoGraph. Otherwise, if you haven't modified the original graph data, you can simply reload it from the original graph format data files.

By default, when you restart AnzoGraph, it automatically reloads all the graph data residing in memory when AnzoGraph was last shut down. AnzoGraph restores the last state of graph data from that stored in the install_path/persistence directory. The enable_persistence configuration setting in the settings.conf file determines whether graph data is saved in the persistence directory. By default, the enable_persistence setting is set to true. You can use the SPARQL DELETE and DELETE DATA commands to remove currently loaded graphs from AnzoGraph memory.

There are various other SPARQL functions you can use to insert, update, and delete (drop) graph data and triples already loaded into AnzoGraph. See Data Loading and Updating Data for more information on those operations.

COPY Syntax

The AnzoGraph COPY command syntax is different than the COPY command described in the WC3 SPARQL specification. The syntax and use cases described below provide details about the options available in AnzoGraph to copy graph data.

Copying Graph Data to a Single File

COPY [ ALL |  graph1 [ graph2 graph3 ... ] TO <file:/path/filename.ext[.gz]>

This command syntax, including the file: option in the URI, copies all currently loaded data from specified graphs to a single file. COPY ALL copies data from all currently loaded graphs.

If you run COPY on a cluster, each node copies out its currently loaded graph data to its own file. So, you can copy graph data to a node's local file system, in which case, each node will create a file storing the graph data stored on that node. Or, you can pick a common shared location to save files that all nodes can access, like an NFS mount.

Files created with this command syntax, whose name is specified by filename, can be assigned the file extension .ttl .n , .nt, .nq, .quads, or .trig. (See Supported Load File Types for the graph data load types that AnzoGraph supports.) You may optionally also specify the .gz suffix to compress saved files.

To preserve any graph name information associated with currently loaded graph data, you need to specify saving the graph data to a file in a quad data format such as .nq, .quads, or .trig.

Copying Graph Data to a Multiple Files in a Directory

The following syntax allows you to specify a directory in which to copy graph data to multiple external files in a directory location you specify:

COPY [ ALL |  graph1 [ graph2 graph3 ... ] TO <dir:/path/dirname.ext[.gz]>

When you specify dir: in the URI file path, AnzoGraph will copy data from the specified graphs to several smaller files in the format specified by ext in the URI. To preserve any graph name information from the original graph data loaded in AnzoGraph, you need to specify a quad extension type such as .nq, .quads, or .trig.

The directory you specify to store graph data with the dir: option may be that of a local drive directory, or shared directory location, such as an NFS mount . Do not include a slash at the end of the directory name. For example, <dir:/tmp/rdf.trig.gz> is valid, and <dir:/tmp/rdf.trig.gz/> is invalid.

By default, AnzoGraph creates 5 MB ext.gz files in the specified directory. To configure AnzoGraph to create a different file size, you can change the settings file, settings.conf, to add copy_file_size=number_of_MB to the file. For instructions on changing settings, see Changing System Settings.

In a cluster, each node copies a subset of the data. If you copy graph data to the local file system, you can retrieve the data from files in the same location on each node. However, if you copy graph data to a shared file system, all the files will be stored in the same place.

Copying graph data to a file or directory does not remove the copied data from AnzoGraph memory. To actually remove graph data from AnzoGraph memory, you can use the DELETE and DELETE DATA commands. To remove persisted data when restarting the database, you can run the following command:

azgctl -restart -init data

COPY Command Use Cases

The example below copies data from the flights graph to a single flights.ttl.gz file in the user's home directory.

COPY <flights> TO <file:/home/user/flights.ttl.gz>

In a cluster, the flights.ttl.gz file is created on the leader node.

The following example copies data from two graphs named flights and airports to a flight-data.trig.gz directory in the user's home directory.

COPY <flights> <airports> TO <dir:/home/user/flight-data.trig.gz>

By specifying the .trig format extension, you also assure that any graph name information is preserved in all of the files created in the flight-data.trig.gz directory.

Similarly, you could use the COPY ALL syntax to copy data from all graphs currently loaded in AnzoGraph, for example:

COPY ALL TO <dir:/home/user/allgraphs.trig.gz>

Again, you would also want to make sure you specify a quad file format extension, such as .trig, to save the associated graph name information in all the files stored in the allgraphs.trig.gz directory.