Creating a New Dataset Pipeline to Ingest Data into Anzo
This topic provides instructions for creating a new Dataset Pipeline to ingest data into Anzo. Dataset pipelines produce a new dataset in the Dataset catalog and generate RDF load files for AnzoGraph. All auto-ingested projects are dataset pipelines.
For instructions on creating a pipeline that moves data to a file or database, see Creating a New ETL Pipeline to Move Data to a File or Database. For information about creating unstructured pipelines, see Creating a New Unstructured Pipeline.
- In the Anzo console, expand the Onboard menu and click Structured Data. Then click the Pipelines tab. Anzo displays the Pipelines screen. For example:
- Click the Create button at the top of the screen and select Dataset Pipeline. Anzo displays the Create Dataset Pipeline screen.
- Type a name for the pipeline in the Title field and enter an optional Description.
- Click the Engine Configuration drop-down list and select the ETL engine for this pipeline.
- Click the Graph datasource drop-down list and select the graph source where you want Anzo to save the AnzoGraph load files that are generated when jobs in this pipeline are published.
- In the Job Title field, type a name for the first job in the pipeline.
- Click Save to create the pipeline. Anzo displays the pipeline overview screen. For example:
- To view the job that was created in the pipeline, click the Jobs tab. Anzo displays the Jobs screen. For example:
The Jobs pane on the left side of the screen lists the jobs in the pipeline. The Components pane on the right side of the screen lists the data elements and mappings that are available for building the job. The white space in main part of the screen is the working area where you construct the job.
- To configure the job, drag and drop a data source element from the Mappings or Data Elements tab onto the main part of the screen. Expand an element to drill down to a selectable source. For example, in the image below the Tickit Data - tickit_sales mapping is added to the job:
If you drag a mapping onto the job canvas and there is only one source and one target for the mapping, Anzo automatically adds that source and target to the job.
- If necessary, specify the mapping to use to map the source data to the target for this pipeline. To do so, click the plus icon to the right of the source element. Anzo suggests a mapping to use based on the data source. For example:
- To add the mapping to the job, select the item in the tooltip and drag it onto the screen. For example:
- If necessary, complete the job by specifying the target. Click the plus icon to the right of the mapping element. Anzo suggests a target to use based on the data source. Select Project Graph Data Source as the target to use the graph data source that you specified when you created the pipeline. Drag the target onto the working space. For example:
- If you want to create additional jobs for this pipeline, click the Create New Job button on the left side of the screen. Repeat the steps above to complete the job.
- To run the pipeline and all of the jobs, click the Publish button at the top of the screen. If you want to run a single job in the pipeline, click the checkbox next to the job, then click the Publish button under the list of jobs.