Tutorials

There are three ways to onboard structured and semi-structured data to Anzo:

Automated ETL Pipeline Workflow

If the Spark component is installed, you can onboard data using Anzo's built-in pipelines that use an automated workflow and follow the traditional extract, transform, and load (ETL) process. When a pipeline is generated, Anzo automatically creates a data model, mappings, and the ETL jobs required to ingest the source. These Spark ETL pipelines natively support CSV, JSON, XML, SAS, and Parquet files, along with all common database connections.

Automated Direct Data Load Workflow

If Spark is not installed or you do not want to use the ETL pipeline workflow, you can use another automated workflow that follows an extract, load, and transform (ELT) process. In the ELT workflow, data sources are onboarded directly to graphmarts. Data layers with SPARQL queries are automatically generated to transform and blend the data to an analytics-ready knowledge graph. The AnzoGraph Graph Data Interface (GDI) Java plugin (sometimes called the Data Toolkit) is used to connect to the sources, create a model, and generate the data layer queries. The automated direct data load workflow supports all of the data sources that the automated ETL ingestion process supports.

Manual Ingestion with the Graph Data Interface

For advanced users who are familiar with SPARQL, the GDI can also be invoked by writing queries from scratch. The GDI is extremely flexible, allowing you to connect directly to sources via queries and control all aspects of the extract, load, and transform process. In addition to the data sources that the two automated workflows support, you can also ingest raw data and data from HTTP/REST endpoints with manually written SPARQL queries.

This section includes tutorials for helping you get started with the automated ETL and direct data load ingestion methods. For information about the advanced method of writing GDI SPARQL queries for ingestion, see Onboarding or Virtualizing Data with the Graph Data Interface.