Copying Graphs to Files

This topic provides instructions for using the COPY command to copy data from the database to compressed Turtle files (.ttl.gz) on disk. Typically, the only reason for explicitly copying data from AnzoGraph to a file, is that you may have added to, removed, or updated data in a graph and want to store the current loaded data as an archive copy that you can reload later. Otherwise, if you haven't modified the original graph data, you can simply reload it from the original Turtle files.

By default, when you restart AnzoGraph, it automatically reloads all the graph data residing in memory when AnzoGraph was last shut down. AnzoGraph restores the last state of graph data from that stored in the install_path/persistence directory. The enable_persistence configuration setting in the settings.conf file determines whether graph data is saved in the persistence directory. By default, the enable_persistence setting is set to true; if you want to clear out loaded graph data every time you restart AnzoGraph, you can set enable_persistence to false. You can also use the SPARQL DELETE and DELETE DATA commands to remove currently loaded graphs from AnzoGraph memory.

There are various other SPARQL functions you can use to insert, update, and delete (drop) graph data and triples already loaded into AnzoGraph. See Data Load and Update for more information on those operations.

COPY Syntax

The AnzoGraph COPY command syntax is different than the COPY command described in the WC3 SPARQL specification. Use the following syntax in AnzoGraph to copy graphs from AnzoGraph to a file or files. The list below the syntax provides details about the options:

COPY graph1 [ graph2 graph3 ... ] TO <dir|file:/path/dirname.ttl.gz>
  • Where graph is the URI for each of the graphs that you want to copy.
  • In the URI for the file path, specify dir if you want AnzoGraph to copy the graph or graphs to several smaller files or file if you want to copy the data into a single file. Specify a dirname that does not exist. AnzoGraph creates the directory.
  • The directory name must end in either .ttl or .ttl.gz. Use .ttl to export uncompressed Turtle files. And use .ttl.gz to export compressed Turtle files. Do not include a slash at the end of the directory name. For example, <dir:/tmp/rdf.ttl.gz> is valid, and <dir:/tmp/rdf.ttl.gz/> is invalid.

Copying graph data to a file does not remove the copied data from AnzoGraph memory. To actually remove graph data from AnzoGraph memory, you can use the DELETE and DELETE DATA commands.

For example, the following command copies data from two graphs named flights and airports to a flight-data.ttl.gz directory in the user's home directory.

COPY <flights> <airports> TO <dir:/home/user/flight-data.ttl.gz>

By default, AnzoGraph creates 5 MB .ttl.gz files in the specified directory. On a cluster, each node copies a subset of the data. You can retrieve the files from the same location on each node. To configure AnzoGraph to create a different file size, you can change the settings file, settings.conf, to add copy_file_size=number_of_MB to the file. For instructions on changing settings, see Changing System Settings.

The example below copies data from the flights graph to a single flights.ttl.gz file in the user's home directory.

COPY <flights> TO <file:/home/user/flights.ttl.gz>

On a cluster, the flights.ttl.gz file is created on the leader node.