Introduction to the Graph Data Interface
The Graph Data Interface (GDI) service has built-in, native support for various database and file format types as well as HTTP endpoints, flat files, and Elasticsearch integration. Based on the queries that you write, the GDI service can be used to extract, load, and transform data in AnzoGraph or analyze remote data without updating the database. When ingesting data that is in relational or semi-structured format, the GDI service can convert the data to the graph model.
This topic lists the supported data sources, provides an introduction to connectivity and authentication, and includes information about the GDI data model.
Supported Data Sources
The GDI service can connect to any database, web server, or other source that supports HTTP/REST or JDBC APIs. The following JDBC types are included:
- Apache Derby, Hive, and Impala
- Google BigQuery
- IBM DB2
- Microsoft SQL Server
- MariaDB/MySQL
- Hyper SQL Database (HSQLDB)
- PostgreSQL
- SAP Sybase (jTDS)
To add a JDBC driver for another database, simply copy the JDBC jar file to the /opt/anzograph/lib/udx
directory and restart AnzoGraph.
The GDI also natively supports reading or ingesting data in the following file types:
- CSV and TSV
- JSON
- XML
- Parquet
- SAS (SAS Transport XPT and SAS7BDAT formats)
Data Source Connections and Authentication
When connecting to data sources, connection parameters like keys, tokens, and user credentials are provided as part of the query that you run against that source. To avoid including sensitive information in each request, however, AnzoGraph provides the option to create and manage Query Contexts. A context specifies all of the connection details for a source. Queries simply reference the context so that sensitive information is abstracted from the request. For more information about contexts, see Using Query Contexts.
Graph Data Interface Model
One way to learn about the capabilities of the Graph Data Interface is to explore the GDI Ontology. Currently this documentation focuses on the most common GDI use cases, queries, and data source types. Exploring the classes and their properties in the model not only provides additional details for the documented use cases, but it introduces you to additional possibilities that are supported by the GDI.
The ontology is included in the datatoolkit*.jar file in the install_path/lib/udx
directory on the AnzoGraph file system. Each data source type is represented by an owl:Class and all of the related input parameters are properties in the classes. You can click the link below to quickly view a version of the ontology.
The attached ontology is from the GDI version 2.0.0-202007161458 release and might differ from the GDI version in your deployment. The following query can be used to return metadata about the version of the GDI that you are using:
PREFIX s: <http://cambridgesemantics.com/ontologies/DataToolkit#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> SELECT * WHERE { SERVICE <http://cambridgesemantics.com/services/DataToolkit> { ?data a s:SystemInformationSource ; ?title () ; ?version () ; ?vendor () ; ?timestamp xsd:dateTime . } }
For example:
title | version | vendor | timestamp --------------------------+--------------------+---------------------------+---------------------- DataToolkit for AnzoGraph | 2.0.0-202007161458 | Cambridge Semantics, Inc. | 2020-07-16T19:00:35Z 1 rows