Reading Data Source Metadata

If you want to retrieve instance data from a source but are unsure about the data model, schema, or the exact names of columns and their data types, you can use the Graph Data Interface (GDI) service to explore the source's metadata. The service can be used to return a list of the catalogs (schemas), models, columns, data types, and other data source specific information. This topic describes the metadata query syntax and provides several example queries.

Metadata Query Syntax

The following query syntax shows the structure of a metadata query. The table below the query describes the parts of the query and defines the placeholder elements in bold.

PREFIX s:      <http://cambridgesemantics.com/ontologies/DataToolkit#>
PREFIX rdf:    <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs:   <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd:    <http://www.w3.org/2001/XMLSchema#>
PREFIX owl:	<http://www.w3.org/2002/07/owl#>
PREFIX anzo:	<http://openanzo.org/ontologies/2008/07/Anzo#>
PREFIX zowl:	<http://openanzo.org/ontologies/2009/05/AnzoOwl#>
PREFIX dc:	<http://purl.org/dc/elements/1.1/>

SELECT *
WHERE
{
	SERVICE <http://cambridgesemantics.com/services/DataToolkit>
    {	
	[] s:select ?metadata .

	?data a s:source_type ;
		s:connection_parameters .

	?metadata a s:MetadataSource ;
		s:from ?data ;
		
	?metadata_selector [
		?metadata_type(datatype) ;
		... ;
	] .
    }
}
Clause or Option Description
PREFIX Clause The PREFIX clause shown above declares the prefixes that are common or standard for all GDI service queries. You can declare additional prefixes to use in the query.
Result Clause The result clause defines the results to return. For metadata queries, the result clause is typically SELECT *.
WHERE Clause The WHERE clause includes the required GDI SERVICE call and defines the patterns to look for in the source. The rows below describe the patterns in the WHERE clause.
[] s:select ?metadata Required triple pattern for metadata queries. The select property specifies the source that should be used to return data.
source_type The ?data a s:source_type triple pattern specifies the type of data source that the query will run against. For example, ?data a s:DbSource. The list below describes the commonly used types:
  • DbSource for any type of database.
  • FileSource for flat files.
  • HttpSource for HTTP endpoints.

For a complete list of the supported source types, view the GDI ontology. Each data source type is represented by an owl:Class. See Graph Data Interface Model for more information.

connection_parameters The source connection parameters are the values that are required for accessing the source, such the database connection URL, path to a file source, username, password, key, token, etc. The connection details can be referenced from a Query Context or specified in the query. For more information about contexts, see Using Query Contexts.

For example, the pattern below specifies the connection details for a database source:

?data a s:DbSource ;
   s:url "jdbc:postgresql://10.100.2.9:5555/kl_hosp_db?
user=postgres&password=postgres123"

The example below specifies the connection details for a file-based source, a directory of CSV files:

?data a s:FileSource ;
   s:url "/opt/shared-files/sales-csv"

For a single file, specify the filename in the URL. For example, "/opt/shared-files/sas/edu_inc.sas7bdat".

?metadata_selector The rest of the WHERE clause defines the metadata to retrieve. The metadata_selector specifies the type of metadata to return. The following list describes the valid selectors:
  • catalogs: This selector narrows the results to schema-related metadata such as the schema names. Even when additional metadata types (described in the row below) are specified as objects, only catalog (schema) information is returned.
  • fields: This selector is the broadest and most flexible option. Using the fields selector enables users to return any and all of the source metadata information, depending on the specified metadata types (described in the row below).
  • models: This selector narrows the results to model-related metadata such as the model names. Even when additional metadata types (described in the row below) are specified as objects, only model information is returned.
metadata_type (datatype) The triple patterns in the array for the metadata selector specify the type of metadata to return as well as the data type for the return value.

The following list shows all of the valid options. You can include any combination of patterns. The results that are returned depend on the type of data source and whether the information exists in the source.

?model (xsd:string) ;   # Returns model names in string format
                        # For file sources, returns the
                        # file names

?field (xsd:string) ;   # Returns column names
?catalog (xsd:string) ; # Returns schema names
?datatype (owl:Thing) ; # Returns data types of the columns
?format (xsd:string) ;  # Returns the format of the source
?cardinality (xsd:string) ; # Returns the cardinality of
                            # relationships between tables:
                            # optional, many, or required
 
?count (xsd:int) ;   # Returns the number of times the field
                     # appears in the source

?order (xsd:int) ;   # Returns the order in which the field
                     # was encountered 

Metadata Query Examples

This section includes sample metadata queries that run against different types of data sources.

View Database Schemas

The query below sends a metadata query to a MySQL database to return a list of the schemas that are available:

PREFIX s:      <http://cambridgesemantics.com/ontologies/DataToolkit#>
PREFIX rdf:    <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs:   <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd:    <http://www.w3.org/2001/XMLSchema#>
PREFIX owl:	<http://www.w3.org/2002/07/owl#>
PREFIX anzo:	<http://openanzo.org/ontologies/2008/07/Anzo#>
PREFIX zowl:	<http://openanzo.org/ontologies/2009/05/AnzoOwl#>
PREFIX dc:	<http://purl.org/dc/elements/1.1/>

SELECT *
WHERE
{
	SERVICE <http://cambridgesemantics.com/services/DataToolkit>
    {
	[] s:select ?metadata .

	?data a s:DbSource ;
		s:url "jdbc:mysql://10.100.2.9:5555/?user=root&password=Mysql1@#" .

	?metadata a s:MetadataSource ;
		s:from ?data ;
	?catalogs [
		?catalog xsd:string ;
		?order xsd:int ;
	] .
   }
}
ORDER BY ?catalog

The query returns the following results:

catalog               | order
----------------------+-------
BANKTEST_DB           |     1
EMR                   |     4
GOLFCLUB_DB           |     8
NORTHWIND             |    10
SPORTDB               |    13
SQLPOCKET_DB          |    14
WORDPRESS_DB          |    16
classicmodels         |     2
crm_national_patients |     3
emrdbbig              |     5
emrdbsmall            |     6
emrnational_schema    |     7
mysql                 |     9
optum                 |    11
performance_schema    |    12
sys                   |    15
16 rows

Explore a Database Schema

Using the list of schemas that were returned in the example above (View Database Schemas), the query below returns metadata about the columns in one of the schemas. To narrow the results to a schema, the schema name (NORTHWIND) is added to the connection URL.

PREFIX s:      <http://cambridgesemantics.com/ontologies/DataToolkit#>
PREFIX rdf:    <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs:   <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd:    <http://www.w3.org/2001/XMLSchema#>
PREFIX owl:	<http://www.w3.org/2002/07/owl#>
PREFIX anzo:	<http://openanzo.org/ontologies/2008/07/Anzo#>
PREFIX zowl:	<http://openanzo.org/ontologies/2009/05/AnzoOwl#>
PREFIX dc:	<http://purl.org/dc/elements/1.1/>

SELECT *
WHERE
{
	SERVICE <http://cambridgesemantics.com/services/DataToolkit>
    {
	[] s:select ?metadata .

	?data a s:DbSource ;
		s:url "jdbc:mysql://10.100.2.9:5555/NORTHWIND?user=root&password=Mysql1@#" .

	?metadata a s:MetadataSource ;
		s:from ?data ;
	?fields [
		?model xsd:string ;
		?field xsd:string ;
		?datatype owl:Thing ;
	] .
   }
}
ORDER BY ?model

The query returns the following results:

model                          | field                | datatype
-------------------------------+----------------------+---------------------------------------------
Alphabetical list of products  | CategoryID           | http://www.w3.org/2001/XMLSchema#int
Alphabetical list of products  | Discontinued         | http://www.w3.org/2001/XMLSchema#boolean
Alphabetical list of products  | SupplierID           | http://www.w3.org/2001/XMLSchema#int
Alphabetical list of products  | UnitPrice            | http://www.w3.org/2001/XMLSchema#decimal
Alphabetical list of products  | ProductName          | http://www.w3.org/2001/XMLSchema#string
Alphabetical list of products  | QuantityPerUnit      | http://www.w3.org/2001/XMLSchema#string
Alphabetical list of products  | UnitsOnOrder         | http://www.w3.org/2001/XMLSchema#short
Alphabetical list of products  | CategoryName         | http://www.w3.org/2001/XMLSchema#string
Alphabetical list of products  | ProductID            | http://www.w3.org/2001/XMLSchema#int
Alphabetical list of products  | ReorderLevel         | http://www.w3.org/2001/XMLSchema#short
Alphabetical list of products  | UnitsInStock         | http://www.w3.org/2001/XMLSchema#short
Categories                     | CategoryID           | http://www.w3.org/2001/XMLSchema#int
Categories                     | Description          | http://www.w3.org/2001/XMLSchema#string
Categories                     | Picture              | http://www.w3.org/2001/XMLSchema#base64Binary
Categories                     | CategoryName         | http://www.w3.org/2001/XMLSchema#string
Categories                     | categoryid           |
Category Sales for 1997        | CategoryName         | http://www.w3.org/2001/XMLSchema#string
Category Sales for 1997        | CategorySales        | http://www.w3.org/2001/XMLSchema#double
Current Product List           | ProductName          | http://www.w3.org/2001/XMLSchema#string
Current Product List           | ProductID            | http://www.w3.org/2001/XMLSchema#int
...
201 rows

Explore a Directory of SAS Files

The query below explores a directory of SAS files to return the model, catalog (schema), field, data type, and cardinality information. The query also orders the results by model name, which is the file name for file sources of a data model does not exist.

PREFIX s:      <http://cambridgesemantics.com/ontologies/DataToolkit#>
PREFIX rdf:    <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs:   <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd:    <http://www.w3.org/2001/XMLSchema#>
PREFIX owl:	<http://www.w3.org/2002/07/owl#>
PREFIX anzo:	<http://openanzo.org/ontologies/2008/07/Anzo#>
PREFIX zowl:	<http://openanzo.org/ontologies/2009/05/AnzoOwl#>
PREFIX dc:	<http://purl.org/dc/elements/1.1/>

SELECT *
WHERE
{
	SERVICE <http://cambridgesemantics.com/services/DataToolkit>
    {
	[] s:select ?metadata .

	?data a s:FileSource ;
		s:url "/opt/shared-files/sas" .

	?metadata a s:MetadataSource ;
		s:from ?data ;

	?fields [
		?model xsd:string ;
		?field xsd:string ;
		?catalog xsd:string ;
		?datatype owl:Thing ;
		?cardinality xsd:string ;
	] .
   }
}
ORDER BY ?model

The query returns the following results:

model   | field       | catalog | datatype                                | cardinality
--------+-------------+---------+-----------------------------------------+-------------
demand  | P1          | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED
demand  | P2          | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED
demand  | P3          | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED
demand  | Y           | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED
demand  | Q1          | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED
demand  | Q2          | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED
demand  | Q3          | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED
demo    | YEAR        | les/sas | http://www.w3.org/2001/XMLSchema#long   | REQUIRED
demo    | QTR         | les/sas | http://www.w3.org/2001/XMLSchema#long   | REQUIRED
demo    | GDP         | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED
demo    | PR          | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED
demo    | M1          | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED
demo    | RS          | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED
airline | YEAR        | les/sas | http://www.w3.org/2001/XMLSchema#long   | REQUIRED
airline | Y           | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED
airline | W           | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED
airline | R           | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED
airline | L           | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED
airline | K           | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED
cars    | MPG         | les/sas | http://www.w3.org/2001/XMLSchema#long   | REQUIRED
cars    | CYL         | les/sas | http://www.w3.org/2001/XMLSchema#long   | REQUIRED
...
50 rows

Explore an HTTP Endpoint

The query below explores the metadata for an HTTP endpoint. The query runs against the Dark Sky API, which compiles worldwide weather data.

PREFIX s:      <http://cambridgesemantics.com/ontologies/DataToolkit#>
PREFIX rdf:    <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs:   <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd:    <http://www.w3.org/2001/XMLSchema#>
PREFIX owl:	<http://www.w3.org/2002/07/owl#>
PREFIX anzo:	<http://openanzo.org/ontologies/2008/07/Anzo#>
PREFIX zowl:	<http://openanzo.org/ontologies/2009/05/AnzoOwl#>
PREFIX dc:	<http://purl.org/dc/elements/1.1/>

SELECT *
WHERE
{
	SERVICE <http://cambridgesemantics.com/services/DataToolkit>
    {
	[] s:select ?metadata .

	?data a s:HttpSource ;
		s:url "https://api.darksky.net/forecast/bdbe3f638eb908c9b94919537dad5945/30.374563,-97.975892" .

	?metadata a s:MetadataSource ;
		s:from ?data ;

	?fields [
		?model xsd:string ;
		?field xsd:string ;
		?datatype owl:Thing ;
		?cardinality xsd:string ;
		?order xsd:int ;
	] .
   }
}
ORDER BY ?model ?order

The query returns the following results:

model     | field                       | datatype                                | cardinality | order
----------+-----------------------------+-----------------------------------------+-------------+-------
currently | time                        | http://www.w3.org/2001/XMLSchema#int    | REQUIRED    |     6
currently | summary                     | http://www.w3.org/2001/XMLSchema#string | REQUIRED    |     7
currently | icon                        | http://www.w3.org/2001/XMLSchema#string | REQUIRED    |     8
currently | nearestStormDistance        | http://www.w3.org/2001/XMLSchema#int    | REQUIRED    |     9
currently | nearestStormBearing         | http://www.w3.org/2001/XMLSchema#int    | REQUIRED    |    10
currently | precipIntensity             | http://www.w3.org/2001/XMLSchema#int    | REQUIRED    |    11
currently | precipProbability           | http://www.w3.org/2001/XMLSchema#int    | REQUIRED    |    12
currently | temperature                 | http://www.w3.org/2001/XMLSchema#float  | REQUIRED    |    13
currently | apparentTemperature         | http://www.w3.org/2001/XMLSchema#float  | REQUIRED    |    14
currently | dewPoint                    | http://www.w3.org/2001/XMLSchema#float  | REQUIRED    |    15
currently | humidity                    | http://www.w3.org/2001/XMLSchema#float  | REQUIRED    |    16
currently | pressure                    | http://www.w3.org/2001/XMLSchema#float  | REQUIRED    |    17
currently | windSpeed                   | http://www.w3.org/2001/XMLSchema#float  | REQUIRED    |    18
currently | windGust                    | http://www.w3.org/2001/XMLSchema#float  | REQUIRED    |    19
currently | windBearing                 | http://www.w3.org/2001/XMLSchema#int    | REQUIRED    |    20
currently | cloudCover                  | http://www.w3.org/2001/XMLSchema#float  | REQUIRED    |    21
currently | uvIndex                     | http://www.w3.org/2001/XMLSchema#int    | REQUIRED    |    22
currently | visibility                  | http://www.w3.org/2001/XMLSchema#int    | REQUIRED    |    23
currently | ozone                       | http://www.w3.org/2001/XMLSchema#float  | REQUIRED    |    24
daily     | summary                     | http://www.w3.org/2001/XMLSchema#string | REQUIRED    |    75
daily     | icon                        | http://www.w3.org/2001/XMLSchema#string | REQUIRED    |    76
daily     | data                        |                                         | MANY        |    77
data      | time                        | http://www.w3.org/2001/XMLSchema#int    | REQUIRED    |    29
data      | precipIntensity             | http://www.w3.org/2001/XMLSchema#float  | REQUIRED    |    30
data      | precipProbability           | http://www.w3.org/2001/XMLSchema#float  | REQUIRED    |    31
data      | summary                     | http://www.w3.org/2001/XMLSchema#string | OPTIONAL    |    32
...
81 rows

The following query retrieves the model, field, and data type metadata for the United States from the publicly available Data API Covid Tracking Project.

PREFIX s:      <http://cambridgesemantics.com/ontologies/DataToolkit#>
PREFIX rdf:    <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs:   <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd:    <http://www.w3.org/2001/XMLSchema#>
PREFIX owl:    <http://www.w3.org/2002/07/owl#>
PREFIX anzo:   <http://openanzo.org/ontologies/2008/07/Anzo#>
PREFIX zowl:   <http://openanzo.org/ontologies/2009/05/AnzoOwl#>
PREFIX dc:     <http://purl.org/dc/elements/1.1/>

SELECT *
WHERE
{
	SERVICE <http://cambridgesemantics.com/services/DataToolkit>
    {
	[] s:select ?metadata .

	?data a s:HttpSource ;
		s:url "https://covidtracking.com/api/v1/us/current.csv" .

	?metadata a s:MetadataSource ;
		s:from ?data ;

	?fields [
		?model xsd:string ;
		?field xsd:string ;
		?datatype owl:Thing ;
	] .
    }
}

The query returns the following results:

model | field                    | datatype
------+--------------------------+-----------------------------------------
us    | date                     | http://www.w3.org/2001/XMLSchema#string
us    | states                   | http://www.w3.org/2001/XMLSchema#string
us    | positive                 | http://www.w3.org/2001/XMLSchema#string
us    | negative                 | http://www.w3.org/2001/XMLSchema#string
us    | pending                  | http://www.w3.org/2001/XMLSchema#string
us    | hospitalizedCurrently    | http://www.w3.org/2001/XMLSchema#string
us    | hospitalizedCumulative   | http://www.w3.org/2001/XMLSchema#string
us    | inIcuCurrently           | http://www.w3.org/2001/XMLSchema#string
us    | inIcuCumulative          | http://www.w3.org/2001/XMLSchema#string
us    | onVentilatorCurrently    | http://www.w3.org/2001/XMLSchema#string
us    | onVentilatorCumulative   | http://www.w3.org/2001/XMLSchema#string
us    | recovered                | http://www.w3.org/2001/XMLSchema#string
us    | dateChecked              | http://www.w3.org/2001/XMLSchema#string
us    | death                    | http://www.w3.org/2001/XMLSchema#string
us    | hospitalized             | http://www.w3.org/2001/XMLSchema#string
us    | lastModified             | http://www.w3.org/2001/XMLSchema#string
us    | total                    | http://www.w3.org/2001/XMLSchema#string
us    | totalTestResults         | http://www.w3.org/2001/XMLSchema#string
us    | posNeg                   | http://www.w3.org/2001/XMLSchema#string
us    | deathIncrease            | http://www.w3.org/2001/XMLSchema#string
us    | hospitalizedIncrease     | http://www.w3.org/2001/XMLSchema#string
us    | negativeIncrease         | http://www.w3.org/2001/XMLSchema#string
us    | positiveIncrease         | http://www.w3.org/2001/XMLSchema#string
us    | totalTestResultsIncrease | http://www.w3.org/2001/XMLSchema#string
us    | hash                     | http://www.w3.org/2001/XMLSchema#string
25 rows

Explore a Directory of CSV Files

The query below explores a directory of CSV files to return the model, field, and data type. The query also orders the results by model name, which is the file name for file sources of a data model does not exist. In addition, the query includes s:sampling true, which means the GDI will scan the entire file or files before returning results.

PREFIX s:      <http://cambridgesemantics.com/ontologies/DataToolkit#>
PREFIX rdf:    <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs:   <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd:    <http://www.w3.org/2001/XMLSchema#>
PREFIX owl:	<http://www.w3.org/2002/07/owl#>
PREFIX anzo:	<http://openanzo.org/ontologies/2008/07/Anzo#>
PREFIX zowl:	<http://openanzo.org/ontologies/2009/05/AnzoOwl#>
PREFIX dc:	<http://purl.org/dc/elements/1.1/>

SELECT *
WHERE
{
	SERVICE <http://cambridgesemantics.com/services/DataToolkit>
    {
	[] s:select ?metadata .

	?data a s:FileSource ;
		s:url "/opt/shared-files/movie-csv" .

	?metadata a s:MetadataSource ;
		s:from ?data ;

	# Sample the whole file
	s:sampling true ;

	# Sample the first N records #
	# s:sampling 1000 ;

	?fields [
		?model xsd:string ;
		?field xsd:string ;
		?datatype owl:Thing ;
	] .
   }
}
ORDER BY ?model

The query returns the following results:

model                  | field                     | datatype
-----------------------+---------------------------+-----------------------------------------
MovieActors1           | MovieID                   | http://www.w3.org/2001/XMLSchema#int
MovieActors1           | MovieTitle                | http://www.w3.org/2001/XMLSchema#string
MovieActors1           | ActorID                   | http://www.w3.org/2001/XMLSchema#int
MovieActors1           | ActorName                 | http://www.w3.org/2001/XMLSchema#string
MovieActors2           | MovieID                   | http://www.w3.org/2001/XMLSchema#int
MovieActors2           | MovieTitle                | http://www.w3.org/2001/XMLSchema#string
MovieActors2           | ActorID                   | http://www.w3.org/2001/XMLSchema#int
MovieActors2           | ActorName                 | http://www.w3.org/2001/XMLSchema#string
MovieActors2           | ActorCategory             | http://www.w3.org/2001/XMLSchema#string
MovieCategory          | MovieID                   | http://www.w3.org/2001/XMLSchema#int
MovieCategory          | MovieTitle                | http://www.w3.org/2001/XMLSchema#string
MovieCategory          | MoveCategoryID            | http://www.w3.org/2001/XMLSchema#int
MovieCategory          | MovieCategory             | http://www.w3.org/2001/XMLSchema#string
MovieCinematographers  | MovieID                   | http://www.w3.org/2001/XMLSchema#int
MovieCinematographers  | MovieTitle                | http://www.w3.org/2001/XMLSchema#string
MovieCinematographers  | MovieCinematographerID    | http://www.w3.org/2001/XMLSchema#int
MovieCinematographers  | MovieCinematographerName  | http://www.w3.org/2001/XMLSchema#string
MovieComposers         | MovieID                   | http://www.w3.org/2001/XMLSchema#int
MovieComposers         | MovieTitle                | http://www.w3.org/2001/XMLSchema#string
MovieComposers         | MovieComposerID           | http://www.w3.org/2001/XMLSchema#int
MovieComposers         | MovieComposerName         | http://www.w3.org/2001/XMLSchema#string
MovieDirectors         | MovieID                   | http://www.w3.org/2001/XMLSchema#int
MovieDirectors         | MovieTitle                | http://www.w3.org/2001/XMLSchema#string
...
79 rows

For instructions on querying the instance data based on the data source metadata, see Reading or Ingesting Instance Data.

Related Topics