Reading Remote Source Metadata
If you want to retrieve instance data from a source but are unsure about the data model, schema, or the exact names of columns and their data types, you can use the Graph Data Interface (GDI) service to explore the source's metadata. The service can be used to return a list of the catalogs (schemas), models, columns, data types, and other data source specific information.
Graph Data Interface service calls are processed by AnzoGraph using a Java plugin. Before running GDI service queries, make sure that the AnzoGraph cluster is configured to use the plugin. For more information, see Graph Data Interface Setup.
This topic describes the metadata query syntax and provides several example queries.
Metadata Query Syntax
The following query syntax shows the structure of a metadata query. The clauses, patterns, and placeholders in blue are described below.
# PREFIX Clause PREFIX s: <http://cambridgesemantics.com/ontologies/DataToolkit#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX anzo: <http://openanzo.org/ontologies/2008/07/Anzo#> PREFIX zowl: <http://openanzo.org/ontologies/2009/05/AnzoOwl#> PREFIX dc: <http://purl.org/dc/elements/1.1/> # Result Clause SELECT * WHERE { SERVICE <http://cambridgesemantics.com/services/DataToolkit> { [] s:select ?metadata . ?data a s:source_type ; s:connection_parameters . ?metadata a s:MetadataSource ; s:from ?data ; ?metadata_selector [ ?metadata_type (datatype) ; ... ; ] . } }
PREFIX Clause
The PREFIX clause declares the prefixes that are standard for all GDI service queries. You can declare additional prefixes to use in the query, but the PREFIX clause must include the following statements:
PREFIX s: <http://cambridgesemantics.com/ontologies/DataToolkit#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX anzo: <http://openanzo.org/ontologies/2008/07/Anzo#> PREFIX zowl: <http://openanzo.org/ontologies/2009/05/AnzoOwl#> PREFIX dc: <http://purl.org/dc/elements/1.1/>
Result Clause
The result clause defines the results to return. For metadata queries, the result clause is typically SELECT *.
SERVICE <http://cambridgesemantics.com/services/DataToolkit>
Include the required GDI SERVICE call in the WHERE clause. The rest of the WHERE clause defines the patterns to look for in the source.
[] s:select ?metadata
Include this required triple pattern in metadata queries. The select property specifies the source that should be used to return data.
source_type
The ?data a s:source_type
triple pattern specifies the type of data source that the query targets. For example, ?data a s:DbSource
targets a database data source. The list below describes the commonly used types:
- DbSource for any type of database.
- FileSource for flat files. The supported file types are CSV and TSV, JSON, XML, Parquet, and SAS (SAS Transport XPT and SAS7BDAT formats) .
- HttpSource for HTTP endpoints.
connection_parameters
The source connection parameters are the values that are required for accessing the source, such the database connection URL, path to a file source, username, password, key, token, etc.
For example, the pattern below specifies the connection details for a database source:
?data a s:DbSource ; s:url "jdbc:postgresql://10.100.2.9:5555/kl_hosp_db? user=postgres&password=postgres123"
The example below specifies the connection details for a file-based source, a directory of CSV files:
?data a s:FileSource ; s:url "/opt/shared-files/sales-csv"
For a single file, specify the filename in the URL. For example, "/opt/shared-files/sas/edu_inc.sas7bdat"
.
metadata_selector
The rest of the WHERE clause defines the metadata to retrieve. The metadata_selector specifies the type of metadata to return. The following list describes the valid selectors:
- catalogs: This selector narrows the results to schema-related metadata such as the schema names. Even when additional metadata types (described in the row below) are specified as objects, only catalog (schema) information is returned.
- fields: This selector is the broadest and most flexible option. Using the fields selector enables users to return any and all of the source metadata information, depending on the specified metadata types (described in the row below).
- models: This selector narrows the results to model-related metadata such as the model names. Even when additional metadata types (described in the row below) are specified as objects, only model information is returned.
metadata_type (datatype)
The triple patterns in the array for the metadata selector specify the type of metadata to return as well as the data type for the return value.
The following list shows all of the valid options. You can include any combination of properties. The results that are returned depend on the type of data source and whether the information exists in the source.
- ?model (xsd:string): Returns model names in string format. For file sources, this property returns file names.
- ?field (xsd:string): Returns column names.
- ?catalog (xsd:string): Returns schema names.
- ?datatype (owl:Thing): Returns the data types of the columns.
- ?format (xsd:string): Returns the format of the source.
- ?cardinality (xsd:string): Returns the cardinality of relationships between tables: optional, many, or required.
- ?count (xsd:int): Returns the number of times the field appears in the source.
- ?order (xsd:int): Returns the order in which the field was encountered.
The Metadata Query Examples section below provides sample metadata queries that access various data sources.
Metadata Query Examples
This section includes sample metadata queries that run against different types of data sources.
- View Database Schemas
- Explore a Database Schema
- Explore a Directory of SAS Files
- Explore an HTTP Endpoint
- Explore a Directory of CSV Files
View Database Schemas
The query below sends a metadata query to a MySQL database to return a list of the schemas that are available:
PREFIX s: <http://cambridgesemantics.com/ontologies/DataToolkit#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX anzo: <http://openanzo.org/ontologies/2008/07/Anzo#> PREFIX zowl: <http://openanzo.org/ontologies/2009/05/AnzoOwl#> PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT * WHERE { SERVICE <http://cambridgesemantics.com/services/DataToolkit> { [] s:select ?metadata . ?data a s:DbSource ; s:url "jdbc:mysql://10.100.2.9:5555/?user=root&password=Mysql1@#" . ?metadata a s:MetadataSource ; s:from ?data ; ?catalogs [ ?catalog xsd:string ; ?order xsd:int ; ] . } } ORDER BY ?catalog
The query returns the following results:
catalog | order ----------------------+------- BANKTEST_DB | 1 EMR | 4 GOLFCLUB_DB | 8 NORTHWIND | 10 SPORTDB | 13 SQLPOCKET_DB | 14 WORDPRESS_DB | 16 classicmodels | 2 crm_national_patients | 3 emrdbbig | 5 emrdbsmall | 6 emrnational_schema | 7 mysql | 9 optum | 11 performance_schema | 12 sys | 15 16 rows
Explore a Database Schema
Using the list of schemas that were returned in the example above (View Database Schemas), the query below returns metadata about the columns in one of the schemas. To narrow the results to a schema, the schema name (NORTHWIND) is added to the connection URL.
PREFIX s: <http://cambridgesemantics.com/ontologies/DataToolkit#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX anzo: <http://openanzo.org/ontologies/2008/07/Anzo#> PREFIX zowl: <http://openanzo.org/ontologies/2009/05/AnzoOwl#> PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT * WHERE { SERVICE <http://cambridgesemantics.com/services/DataToolkit> { [] s:select ?metadata . ?data a s:DbSource ; s:url "jdbc:mysql://10.100.2.9:5555/NORTHWIND?user=root&password=Mysql1@#" . ?metadata a s:MetadataSource ; s:from ?data ; ?fields [ ?model xsd:string ; ?field xsd:string ; ?datatype owl:Thing ; ] . } } ORDER BY ?model
The query returns the following results:
model | field | datatype -------------------------------+----------------------+--------------------------------------------- Alphabetical list of products | CategoryID | http://www.w3.org/2001/XMLSchema#int Alphabetical list of products | Discontinued | http://www.w3.org/2001/XMLSchema#boolean Alphabetical list of products | SupplierID | http://www.w3.org/2001/XMLSchema#int Alphabetical list of products | UnitPrice | http://www.w3.org/2001/XMLSchema#decimal Alphabetical list of products | ProductName | http://www.w3.org/2001/XMLSchema#string Alphabetical list of products | QuantityPerUnit | http://www.w3.org/2001/XMLSchema#string Alphabetical list of products | UnitsOnOrder | http://www.w3.org/2001/XMLSchema#short Alphabetical list of products | CategoryName | http://www.w3.org/2001/XMLSchema#string Alphabetical list of products | ProductID | http://www.w3.org/2001/XMLSchema#int Alphabetical list of products | ReorderLevel | http://www.w3.org/2001/XMLSchema#short Alphabetical list of products | UnitsInStock | http://www.w3.org/2001/XMLSchema#short Categories | CategoryID | http://www.w3.org/2001/XMLSchema#int Categories | Description | http://www.w3.org/2001/XMLSchema#string Categories | Picture | http://www.w3.org/2001/XMLSchema#base64Binary Categories | CategoryName | http://www.w3.org/2001/XMLSchema#string Categories | categoryid | Category Sales for 1997 | CategoryName | http://www.w3.org/2001/XMLSchema#string Category Sales for 1997 | CategorySales | http://www.w3.org/2001/XMLSchema#double Current Product List | ProductName | http://www.w3.org/2001/XMLSchema#string Current Product List | ProductID | http://www.w3.org/2001/XMLSchema#int ... 201 rows
Explore a Directory of SAS Files
The query below explores a directory of SAS files to return the model, catalog (schema), field, data type, and cardinality information. The query also orders the results by model name, which is the file name for file sources of a data model does not exist.
PREFIX s: <http://cambridgesemantics.com/ontologies/DataToolkit#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX anzo: <http://openanzo.org/ontologies/2008/07/Anzo#> PREFIX zowl: <http://openanzo.org/ontologies/2009/05/AnzoOwl#> PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT * WHERE { SERVICE <http://cambridgesemantics.com/services/DataToolkit> { [] s:select ?metadata . ?data a s:FileSource ; s:url "/opt/shared-files/sas" . ?metadata a s:MetadataSource ; s:from ?data ; ?fields [ ?model xsd:string ; ?field xsd:string ; ?catalog xsd:string ; ?datatype owl:Thing ; ?cardinality xsd:string ; ] . } } ORDER BY ?model
The query returns the following results:
model | field | catalog | datatype | cardinality --------+-------------+---------+-----------------------------------------+------------- demand | P1 | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED demand | P2 | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED demand | P3 | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED demand | Y | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED demand | Q1 | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED demand | Q2 | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED demand | Q3 | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED demo | YEAR | les/sas | http://www.w3.org/2001/XMLSchema#long | REQUIRED demo | QTR | les/sas | http://www.w3.org/2001/XMLSchema#long | REQUIRED demo | GDP | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED demo | PR | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED demo | M1 | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED demo | RS | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED airline | YEAR | les/sas | http://www.w3.org/2001/XMLSchema#long | REQUIRED airline | Y | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED airline | W | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED airline | R | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED airline | L | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED airline | K | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED cars | MPG | les/sas | http://www.w3.org/2001/XMLSchema#long | REQUIRED cars | CYL | les/sas | http://www.w3.org/2001/XMLSchema#long | REQUIRED ... 50 rows
Explore an HTTP Endpoint
The query below explores the metadata for an HTTP endpoint. The query runs against the Dark Sky API, which compiles worldwide weather data.
PREFIX s: <http://cambridgesemantics.com/ontologies/DataToolkit#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX anzo: <http://openanzo.org/ontologies/2008/07/Anzo#> PREFIX zowl: <http://openanzo.org/ontologies/2009/05/AnzoOwl#> PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT * WHERE { SERVICE <http://cambridgesemantics.com/services/DataToolkit> { [] s:select ?metadata . ?data a s:HttpSource ; s:url "https://api.darksky.net/forecast/bdbe3f638eb908c9b94919537dad5945/30.374563,-97.975892" . ?metadata a s:MetadataSource ; s:from ?data ; ?fields [ ?model xsd:string ; ?field xsd:string ; ?datatype owl:Thing ; ?cardinality xsd:string ; ?order xsd:int ; ] . } } ORDER BY ?model ?order
The query returns the following results:
model | field | datatype | cardinality | order ----------+-----------------------------+-----------------------------------------+-------------+------- currently | time | http://www.w3.org/2001/XMLSchema#int | REQUIRED | 6 currently | summary | http://www.w3.org/2001/XMLSchema#string | REQUIRED | 7 currently | icon | http://www.w3.org/2001/XMLSchema#string | REQUIRED | 8 currently | nearestStormDistance | http://www.w3.org/2001/XMLSchema#int | REQUIRED | 9 currently | nearestStormBearing | http://www.w3.org/2001/XMLSchema#int | REQUIRED | 10 currently | precipIntensity | http://www.w3.org/2001/XMLSchema#int | REQUIRED | 11 currently | precipProbability | http://www.w3.org/2001/XMLSchema#int | REQUIRED | 12 currently | temperature | http://www.w3.org/2001/XMLSchema#float | REQUIRED | 13 currently | apparentTemperature | http://www.w3.org/2001/XMLSchema#float | REQUIRED | 14 currently | dewPoint | http://www.w3.org/2001/XMLSchema#float | REQUIRED | 15 currently | humidity | http://www.w3.org/2001/XMLSchema#float | REQUIRED | 16 currently | pressure | http://www.w3.org/2001/XMLSchema#float | REQUIRED | 17 currently | windSpeed | http://www.w3.org/2001/XMLSchema#float | REQUIRED | 18 currently | windGust | http://www.w3.org/2001/XMLSchema#float | REQUIRED | 19 currently | windBearing | http://www.w3.org/2001/XMLSchema#int | REQUIRED | 20 currently | cloudCover | http://www.w3.org/2001/XMLSchema#float | REQUIRED | 21 currently | uvIndex | http://www.w3.org/2001/XMLSchema#int | REQUIRED | 22 currently | visibility | http://www.w3.org/2001/XMLSchema#int | REQUIRED | 23 currently | ozone | http://www.w3.org/2001/XMLSchema#float | REQUIRED | 24 daily | summary | http://www.w3.org/2001/XMLSchema#string | REQUIRED | 75 daily | icon | http://www.w3.org/2001/XMLSchema#string | REQUIRED | 76 daily | data | | MANY | 77 data | time | http://www.w3.org/2001/XMLSchema#int | REQUIRED | 29 data | precipIntensity | http://www.w3.org/2001/XMLSchema#float | REQUIRED | 30 data | precipProbability | http://www.w3.org/2001/XMLSchema#float | REQUIRED | 31 data | summary | http://www.w3.org/2001/XMLSchema#string | OPTIONAL | 32 ... 81 rows
The following query retrieves the model, field, and data type metadata for the United States from the publicly available Data API Covid Tracking Project.
PREFIX s: <http://cambridgesemantics.com/ontologies/DataToolkit#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX anzo: <http://openanzo.org/ontologies/2008/07/Anzo#> PREFIX zowl: <http://openanzo.org/ontologies/2009/05/AnzoOwl#> PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT * WHERE { SERVICE <http://cambridgesemantics.com/services/DataToolkit> { [] s:select ?metadata . ?data a s:HttpSource ; s:url "https://covidtracking.com/api/v1/us/current.csv" . ?metadata a s:MetadataSource ; s:from ?data ; ?fields [ ?model xsd:string ; ?field xsd:string ; ?datatype owl:Thing ; ] . } }
The query returns the following results:
model | field | datatype ------+--------------------------+----------------------------------------- us | date | http://www.w3.org/2001/XMLSchema#string us | states | http://www.w3.org/2001/XMLSchema#string us | positive | http://www.w3.org/2001/XMLSchema#string us | negative | http://www.w3.org/2001/XMLSchema#string us | pending | http://www.w3.org/2001/XMLSchema#string us | hospitalizedCurrently | http://www.w3.org/2001/XMLSchema#string us | hospitalizedCumulative | http://www.w3.org/2001/XMLSchema#string us | inIcuCurrently | http://www.w3.org/2001/XMLSchema#string us | inIcuCumulative | http://www.w3.org/2001/XMLSchema#string us | onVentilatorCurrently | http://www.w3.org/2001/XMLSchema#string us | onVentilatorCumulative | http://www.w3.org/2001/XMLSchema#string us | recovered | http://www.w3.org/2001/XMLSchema#string us | dateChecked | http://www.w3.org/2001/XMLSchema#string us | death | http://www.w3.org/2001/XMLSchema#string us | hospitalized | http://www.w3.org/2001/XMLSchema#string us | lastModified | http://www.w3.org/2001/XMLSchema#string us | total | http://www.w3.org/2001/XMLSchema#string us | totalTestResults | http://www.w3.org/2001/XMLSchema#string us | posNeg | http://www.w3.org/2001/XMLSchema#string us | deathIncrease | http://www.w3.org/2001/XMLSchema#string us | hospitalizedIncrease | http://www.w3.org/2001/XMLSchema#string us | negativeIncrease | http://www.w3.org/2001/XMLSchema#string us | positiveIncrease | http://www.w3.org/2001/XMLSchema#string us | totalTestResultsIncrease | http://www.w3.org/2001/XMLSchema#string us | hash | http://www.w3.org/2001/XMLSchema#string 25 rows
Explore a Directory of CSV Files
The query below explores a directory of CSV files to return the model, field, and data type. The query also orders the results by model name, which is the file name for file sources of a data model does not exist. In addition, the query includes s:sampling true
, which means the GDI will scan the entire file or files before returning results.
PREFIX s: <http://cambridgesemantics.com/ontologies/DataToolkit#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX anzo: <http://openanzo.org/ontologies/2008/07/Anzo#> PREFIX zowl: <http://openanzo.org/ontologies/2009/05/AnzoOwl#> PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT * WHERE { SERVICE <http://cambridgesemantics.com/services/DataToolkit> { [] s:select ?metadata . ?data a s:FileSource ; s:url "/opt/shared-files/movie-csv" . ?metadata a s:MetadataSource ; s:from ?data ; # Sample the whole file s:sampling true ; # Sample the first N records # # s:sampling 1000 ; ?fields [ ?model xsd:string ; ?field xsd:string ; ?datatype owl:Thing ; ] . } } ORDER BY ?model
The query returns the following results:
model | field | datatype -----------------------+---------------------------+----------------------------------------- MovieActors1 | MovieID | http://www.w3.org/2001/XMLSchema#int MovieActors1 | MovieTitle | http://www.w3.org/2001/XMLSchema#string MovieActors1 | ActorID | http://www.w3.org/2001/XMLSchema#int MovieActors1 | ActorName | http://www.w3.org/2001/XMLSchema#string MovieActors2 | MovieID | http://www.w3.org/2001/XMLSchema#int MovieActors2 | MovieTitle | http://www.w3.org/2001/XMLSchema#string MovieActors2 | ActorID | http://www.w3.org/2001/XMLSchema#int MovieActors2 | ActorName | http://www.w3.org/2001/XMLSchema#string MovieActors2 | ActorCategory | http://www.w3.org/2001/XMLSchema#string MovieCategory | MovieID | http://www.w3.org/2001/XMLSchema#int MovieCategory | MovieTitle | http://www.w3.org/2001/XMLSchema#string MovieCategory | MoveCategoryID | http://www.w3.org/2001/XMLSchema#int MovieCategory | MovieCategory | http://www.w3.org/2001/XMLSchema#string MovieCinematographers | MovieID | http://www.w3.org/2001/XMLSchema#int MovieCinematographers | MovieTitle | http://www.w3.org/2001/XMLSchema#string MovieCinematographers | MovieCinematographerID | http://www.w3.org/2001/XMLSchema#int MovieCinematographers | MovieCinematographerName | http://www.w3.org/2001/XMLSchema#string MovieComposers | MovieID | http://www.w3.org/2001/XMLSchema#int MovieComposers | MovieTitle | http://www.w3.org/2001/XMLSchema#string MovieComposers | MovieComposerID | http://www.w3.org/2001/XMLSchema#int MovieComposers | MovieComposerName | http://www.w3.org/2001/XMLSchema#string MovieDirectors | MovieID | http://www.w3.org/2001/XMLSchema#int MovieDirectors | MovieTitle | http://www.w3.org/2001/XMLSchema#string ... 79 rows
For instructions on querying the instance data based on the data source metadata, see Reading or Ingesting Remote Instance Data.