Importing Data from RDF Files

Source data that is not in RDF format is onboarded through structured or unstructured pipelines, where the data is imported to Anzo and converted to RDF format before becoming available in the catalog. Certain RDF file types, however, can be added to the catalog directly, making the data available for loading and analyzing in AnzoGraph without first being onboarded to Anzo.

Users can add to the catalog any pre-existing file-based linked data set (FLDS), such as when migrating an FLDS from one Anzo server to another. Or they can point Anzo to a directory of Turtle, N-Triple, N-Quad, or TriG files and Anzo will create the FLDS and add the data set to the catalog.

Note: To import data from CSV, JSON, XML, or SAS files, use the import processes described in Importing Data from CSV Files, Importing Data from JSON Files, Importing Data from XML Files, or Importing Data from SAS Files.

This topic provides instructions for making RDF files available in the catalog.

File Requirements

To add a data set to the Anzo catalog, the location of the files, the file format, and the directory structure must meet the following requirements.

  • Supported File Locations: Files must be staged on a configured file store.
  • Supported File Formats: Files must be in one of the following formats.
    • Turtle (.ttl file type)
    • N-Triple (.n3 and .nt file types)
    • N-Quad (.nq and .quads file types)
    • TriG (.trig file type)

    Any of the file types listed above can be compressed in GZIP format and imported as filename.filetype.gz files.

  • Supported Directory Structure: The directory structure that is required depends on whether you are importing an FLDS or files that are not yet part of an FLDS:
    • FLDS Imports: FLDS directories should contain an flds.trig file, an onts directory that includes the model .trig file, and an rdf.ttl or rdf.ttl.gz directory that contains the data files. For example:
      LoadEmployees_f7b1f
      ├── flds.trig
      ├── onts
      │   └── Employees.trig
      └── rdf.ttl.gz
          └── Loadnew_employees_8be23.ttl.gz
      	 └── 20191021034225.ttl.gz
      	     └── part-00000.ttl.gz
      	     └── part-00001.ttl.gz
                   └── part-00003.ttl.gz

      Note: Models must be in TriG format, regardless of the file type of the data files.

    • RDF File Imports: When importing RDF files that are not part of an FLDS, the files must be placed in a directory named rdf.ttl or rdf.ttl.gz. Use one of those names regardless of the file format. Stage N-Triple, N-Quad, and TriG files in a directory named rdf.ttl. Place uncompressed files in an rdf.ttl directory and gzipped files in an rdf.ttl.gz directory.

      Important: All files inside an rdf.ttl or rdf.ttl.gz directory must be the same format and end in the same extension. Data in mixed formats will not load successfully. If you plan to import multiple file types, organize files into separate directories by file extension type, and then import each directory separately.

Importing RDF Files

Follow the instructions below to create an FLDS catalog entry from a directory of Turtle, N-Triple, N-Quad, or TriG files. Make sure that the files and directory meet the requirements in File Requirements.

Tip: Anzo provides the option to link the files to an existing data model during the import. If the model is not yet available in Anzo, consider uploading it before importing the RDF files. See Uploading a Model to Anzo for instructions. You are not required to include a model at import time; a model can be associated with a data set at any time. How do I associate a model with an existing data set?

  1. In the Anzo console, expand the Blend menu and click Datasets. Anzo displays the Datasets screen, which lists the catalog of data sets. For example:

  2. On the Datasets screen, click Import. Anzo opens the Import Catalog Data dialog box.

  3. The Import RDF radio button is selected by default. Type a name for the data set in the Title field and an optional description in the Description field.
  4. Click the RDF File Location field to open the File Location dialog box. Find and select the rdf.ttl or rdf.ttl.gz directory that you want to import, and then click OK to close the dialog box.
  5. If you want to associate a model with this data set, click the Ontologies drop-down list and select the model. To include a system model, select the Include System Data checkbox. If you do not want to associate a model with the data at this time, leave the Ontologies field blank.

    Note: Data sets without a model cannot be viewed in Hi-Res Analytics dashboards, but the imported data can still be queried. A model can be associated with the data set at a later time. How do I associate a model with an existing data set?

  6. Click Save to create the FLDS, add it to the catalog, and return to the Datasets screen. You can now select the FLDS from the catalog and create a graphmart. See Creating Graphmarts and Loading Data to AnzoGraph for instructions.

    Note: Anzo generates an flds.trig file at the same level as the rdf.ttl or rdf.ttl.gz directory. The file contains metadata about the load files.

Importing an FLDS

Follow the instructions below to add an FLDS to the catalog. Make sure that the FLDS meets the requirements in File Requirements.

  1. In the Anzo console, expand the Blend menu and click Datasets. Anzo displays the Datasets screen, which lists the catalog of datasets. For example:

  2. On the Datasets screen, click Import. Anzo opens the Import Catalog Data dialog box.

  3. Select the Import FLDS radio button.
  4. Click the RDF File Location field to open the File Location dialog box. Select the root directory for the FLDS, the directory that contains the flds.trig file, the onts directory, and the rdf.ttl directory. For example:

  5. Click Save to import the FLDS and return to the Datasets screen. You can now select the FLDS in the catalog and create a graphmart. See Creating Graphmarts and Loading Data to AnzoGraph for instructions.
Related Topics