Onboard Structured Data
For structured and semi-structured data sources—databases, HTTP REST endpoints, CSV, JSON, XML, Parquet, and SAS files, and raw data—there are two ways to onboard and/or virtualize the data:
- Automated Direct Data Load: Data from databases and CSV, JSON, XML, Parquet, and SAS files can be onboarded to Anzo via the automated direct data load workflow. This workflow follows an extract, load, and transform (ELT) process to ingest data. In the ELT workflow, data sources are extracted and loaded to graphmarts. Data layers and Direct Load Steps with SPARQL queries are automatically generated to transform and blend the data to analytics-ready knowledge graphs. The AnzoGraph Graph Data Interface (GDI) Java plugin is used to connect to the sources, create a model, and generate the data layer queries. For more information about this workflow, see Onboarding Data with the Automated Workflow.
- Load or Virtualization with SPARQL Queries: Raw data and data from databases, HTTP REST endpoints, and CSV, JSON, XML, Parquet, and SAS files can be onboarded or virtualized by invoking the Graph Data Interface (GDI) with manually written SPARQL queries. The GDI is extremely flexible, allowing you to connect directly to sources and control all aspects of the extract, load, and transform process. You can onboard data into Anzo by adding a Direct Load Step query. Or you can create a virtual graph by adding a View step query. By creating views, you can access the source data exactly when it is needed, without requiring you to ingest all of the data into Anzo up front. For more information about this workflow, see Onboarding or Virtualizing Data with SPARQL Queries.
Whether you plan to use the automated workflow or manually invoke the GDI, the first step in the process is to connect your data sources to Anzo and onboard the schemas. Start with Adding Data Sources.