Introduction to the Graph Data Interface

The Graph Data Interface (GDI) (sometimes called the Data Toolkit) is an extremely flexible and configurable AnzoGraph DB plugin that enables users to access a variety of data sources via SPARQL queries. The GDI has built-in, native support for various file format types, HTTP/REST endpoints, and common database types. Internally, the GDI API has a records-oriented view of the world. This view enables the GDI to bridge graph operations to operations for data in other formats. Though the GDI views the source as rows in a table, ultimately it has the capability to convert the records to graph format, enabling the data to be incorporated and augment the data that is stored in AnzoGraph DB.

This topic lists the supported data sources and provides an introduction to connectivity and authentication.

Supported Data Sources

This section lists the data sources that the GDI supports.

HTTP/REST Endpoints

The GDI natively supports reading or ingesting data from HTTP/REST endpoints.

File Formats

The following file types are supported:

  • CSV and TSV
  • JSON and NDJSON
  • Parquet
  • SAS (SAS Transport XPT and SAS7BDAT formats)
  • XML

In addition, the GDI supports raw text format.

File Systems

The following types of file storage systems are supported:

  • Amazon S3
  • FTP & FTPS
  • Google Cloud Storage
  • HDFS (Kerberized HDFS is not supported at this time.)
  • NFS
  • SFTP
  • WebDAV

Applications

  • Elasticsearch
  • Kafka

Database Types

Cambridge Semantics supplies JDBC drivers for the following databases. For information about acquiring additional JDBC drivers for connecting to other databases, contact your Cambridge Semantics Customer Success manager.

  • Databricks
  • H2
  • IBM DB2
  • Microsoft SQL Server
  • MariaDB
  • Oracle
  • PostgreSQL
  • SAP Sybase (jTDS)
  • Snowflake

Data Source Connections and Authentication

When connecting to data sources, connection parameters like keys, tokens, and user credentials are provided as part of the query that you run against that source. To avoid including sensitive information in each request, however, AnzoGraph DB provides the option to create and manage Query Contexts. A context specifies all of the connection details for a source. Queries simply reference the context so that sensitive information is abstracted from the request. For more information about contexts, see Using a Query Context.

Related Topics