Auto-Ingesting Imported Data

This topic provides instructions for onboarding data from structured data sources using the auto-ingest process. When you auto-ingest data, Anzo automatically generates mappings, a model, and an ETL pipeline. You can also link new data to an existing model.

  1. In the Anzo console, expand the Onboard menu and click Structured Data. Anzo displays the Data Sources screen.
  2. Click the Schemas tab. Anzo displays the Schemas screen, which lists the existing schemas. For example:

  3. On the Schemas screen, click the name of the schema for the data that you want to auto-ingest. Anzo displays the Tables screen for the source. For example:

  4. Click the Auto Ingest button. Anzo opens the Auto Ingest dialog box and automatically populates the data source as well as the graph data source if only one graph source is configured. In addition, Anzo selects the ETL engine if only one engine is configured.

  5. If necessary, click the Graph Data Source field and select the graph data source for this data. For information about creating a graph data source, also known as an Anzo data store, see Creating an Anzo Data Store.
  6. If necessary, click the Auto Map Engine Config field and select the ETL engine to use for the project.
  7. By default, Anzo selects the Select all tables radio button to ingest the data for all tables in the schema. If you do not want to add all tables, click the Custom select radio button and then select each of the tables to add.
  8. By default, the auto-ingestion process is configured to create a new data model. If you want to customize the URIs or transform property names in the new model or if you want to associate this data source with an existing model, click Advanced to view additional configuration options. If you do not want to change options, continue to the next step. Click a description below to view instructions for configuring the auto-ingest workflow accordingly:
  9. Click Create. Anzo creates a pipeline (or updates the existing one) and generates or updates the model and mappings according to the options you specified.
  10. In the main navigation menu under Onboard, click Structured Data. Then click the Pipelines tab.
  11. Click the name of the pipeline to run. Anzo displays the pipeline overview screen. For example:

  12. If you would like to see the jobs that Anzo created for this data source, click the Jobs tab. The jobs are listed on the left side of the screen. A job exists for each of the tables that were imported. For example:

  13. To run the jobs, click Publish. Anzo runs the pipeline and generates the resulting RDF files in a new subdirectory under the specified base directory for the data store.

When the pipeline finishes, you can add the new data set to a graphmart and load it to AnzoGraph so that you can explore and analyze the data using Hi-Res Analytics. See Creating Graphmarts and Loading Data to AnzoGraph for instructions. For information about modifying the auto-generated components, see Working with Mappings, Modeling Data, Working with Schemas, and Working with Pipelines.