Creating an Anzo Data Store

This topic provides instructions for creating an Anzo Data Store. A data store is a directory on a shared file store where file-based linked data sets can be written by Anzo. If you onboard unstructured data, a data store is required. In addition, a data store is required if you use the automated direct load workflow and configure the workflow to export the data to a dataset. You can create one data store and configure all pipelines and workflows to write to that store or you can create multiple data stores to use for different datasets.

  1. In the Administration application, expand the Connections menu and click Anzo Data Store. Anzo displays the Anzo Data Store screen, which lists any existing data stores. For example:

    The Server Anzo Data Store is a default data store that points to the local Anzo file system. This store exists so that first-time users can quickly test the onboarding process. It is not meant to be used in production. Do not change the Data Location to a shared file store; reconfiguring this Data Store can cause unexpected consequences when upgrading or migrating the system. It is safe to delete this store so that it is not presented as an option when users configure ingestion pipelines.

  2. On the Anzo Data Store screen, click the Add Anzo Data Store button and select Add Anzo Data Store. Anzo opens the Create Anzo Data Store screen.

  3. Type a Title and optional Description for the data store.
  4. Click in the Data Location field. Anzo opens the File Location dialog box.

  5. On the left side of the screen, select the File Store on which to create this data store. On the right side of the screen, navigate to the directory that you want to designate as the data location. Select a directory, and then click OK. Or click Create New Folder to create a new directory. Each time a pipeline is run for this data store, a new subdirectory is created under the specified data location.

    The Data Location needs to be a directory on the file store that is shared between Anzo, AnzoGraph, and any Anzo Unstructured, or Elasticsearch servers. If you want Anzo to generate files for this data store in one location and then load the files into AnzoGraph from another location, specify the file generation location in this field, and then specify the AnzoGraph load location in the Alternate Data Location field that is displayed on the Details screen after you save the data store.

  6. If necessary, you can modify the maximum limit for the size of the files that are created by pipelines that write to this data store by specifying the size (in bytes) in the Max File Size Before Compression (Bytes) field. The value applies to files before they are compressed.

    Cambridge Semantics recommends that you do not set this value unless instructed to do so by Cambridge Semantics Support.

  7. Specify whether to compress the generated load files. By default, the Compress output checkbox is selected, indicating that Anzo generates .ttl.gz files when writing to this graph data source. If you clear the checkbox, Anzo generates uncompressed .ttl files. To preserve disk space and reduce read times when loading data into memory, Cambridge Semantics recommends that you accept the default configuration and compress load files.
  8. Click Save to create the data store. Anzo saves the configuration and displays the details view. For example:

    You can click the Edit icon () to modify any of the options. Click the check mark icon () to save changes to an option, or click the X icon () to clear the value for an option.

  9. If you plan to load files into AnzoGraph from a location that is different than the Data Location that you specified, edit the Alternate Data Location field and select the location for AnzoGraph load files.

Once you have create the new data store, you can designate it as the default store so that it is automatically selected when users set up data onboarding workflows. See Set the Default Anzo Data Store for instructions.