Introduction to the GDI

The Graph Data Interface (GDI) (sometimes called the Data Toolkit) is an extremely flexible and configurable AnzoGraph plugin that enables users to access a variety of data sources via federated SPARQL queries. Depending on the type of query you write, i.e., whether it is an INSERT query against the GDI service or a CONSTRUCT query against the view or virtualized service, you can ingest source data into Anzo or create a virtual graph that accesses the source only when it is needed without ingesting the data into Anzo.

The GDI has built-in, native support for various file format types, HTTP/REST endpoints, and common database types. Internally, the GDI API has a records-oriented view of data. This view enables the GDI to bridge graph operations to operations for data in other formats. Though the GDI views the source as rows in a table, ultimately it has the capability to convert the records to graph format, enabling the data to be incorporated into data layers to augment existing data.

Supported Data Sources

This table below lists the data sources, file systems, and applications that the GDI supports.

Source Description
HTTP/REST Endpoints The GDI natively supports reading or ingesting data from HTTP/REST endpoints.
Databases Cambridge Semantics supplies JDBC drivers for the following databases:
  • Databricks
  • H2
  • IBM DB2
  • Microsoft SQL Server
  • MariaDB
  • Oracle
  • PostgreSQL
  • SAP Sybase (jTDS)
  • Snowflake

To extend the service to access other databases, additional JDBC drivers can be added to AnzoGraph. For information about acquiring additional JDBC drivers, contact your Cambridge Semantics Customer Success manager. For instructions on deploying other drivers, see Deploy Optional Drivers for Accessing Custom Database Sources.

File Formats The following file types are supported:
  • CSV and TSV
  • JSON and NDJSON
  • Parquet
  • SAS (SAS Transport XPT and SAS7BDAT formats)
  • XML
  • Raw text format
File Systems The following types of file storage systems are supported:
  • Amazon S3
  • FTP & FTPS
  • Google Cloud Storage
  • HDFS (Kerberized HDFS is not supported at this time.)
  • NFS
  • SFTP
  • WebDAV
Applications Queries against Elasticsearch and Kafka applications are supported.

Related Topics