Reading Data Source Metadata
If you want to retrieve instance data from a source but are unsure about the data model, schema, or the exact names of columns and their data types, you can use the Graph Data Interface (GDI) service to explore the source's metadata. The service can be used to return a list of the catalogs (schemas), models, columns, data types, and other data source specific information. This topic describes the metadata query syntax and provides several example queries.
Metadata Query Syntax
The following query syntax shows the structure of a metadata query. The table below the query describes the parts of the query and defines the placeholder elements in bold.
PREFIX s: <http://cambridgesemantics.com/ontologies/DataToolkit#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX anzo: <http://openanzo.org/ontologies/2008/07/Anzo#> PREFIX zowl: <http://openanzo.org/ontologies/2009/05/AnzoOwl#> PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT * WHERE { SERVICE <http://cambridgesemantics.com/services/DataToolkit> { [] s:select ?metadata . ?data a s:source_type ; s:connection_parameters . ?metadata a s:MetadataSource ; s:from ?data ; ?metadata_selector [ ?metadata_type(datatype) ; ... ; ] . } }
Clause or Option | Description |
---|---|
PREFIX Clause | The PREFIX clause shown above declares the prefixes that are common or standard for all GDI service queries. You can declare additional prefixes to use in the query. |
Result Clause | The result clause defines the results to return. For metadata queries, the result clause is typically SELECT *. |
WHERE Clause | The WHERE clause includes the required GDI SERVICE call and defines the patterns to look for in the source. The rows below describe the patterns in the WHERE clause. |
[] s:select ?metadata | Required triple pattern for metadata queries. The select property specifies the source that should be used to return data. |
source_type | The ?data a s:source_type triple pattern specifies the type of data source that the query will run against. For example, ?data a s:DbSource . The list below describes the commonly used types:
For a complete list of the supported source types, view the GDI ontology. Each data source type is represented by an owl:Class. See Graph Data Interface Model for more information. |
connection_parameters | The source connection parameters are the values that are required for accessing the source, such the database connection URL, path to a file source, username, password, key, token, etc. The connection details can be referenced from a Query Context or specified in the query. For more information about contexts, see Using Query Contexts. For example, the pattern below specifies the connection details for a database source: ?data a s:DbSource ; s:url "jdbc:postgresql://10.100.2.9:5555/kl_hosp_db? user=postgres&password=postgres123" The example below specifies the connection details for a file-based source, a directory of CSV files: ?data a s:FileSource ; s:url "/opt/shared-files/sales-csv" For a single file, specify the filename in the URL. For example, |
?metadata_selector | The rest of the WHERE clause defines the metadata to retrieve. The metadata_selector specifies the type of metadata to return. The following list describes the valid selectors:
|
metadata_type (datatype) | The triple patterns in the array for the metadata selector specify the type of metadata to return as well as the data type for the return value. The following list shows all of the valid options. You can include any combination of patterns. The results that are returned depend on the type of data source and whether the information exists in the source. ?model (xsd:string) ; # Returns model names in string format # For file sources, returns the # file names ?field (xsd:string) ; # Returns column names ?catalog (xsd:string) ; # Returns schema names ?datatype (owl:Thing) ; # Returns data types of the columns ?format (xsd:string) ; # Returns the format of the source ?cardinality (xsd:string) ; # Returns the cardinality of # relationships between tables: # optional, many, or required ?count (xsd:int) ; # Returns the number of times the field # appears in the source ?order (xsd:int) ; # Returns the order in which the field # was encountered |
Metadata Query Examples
This section includes sample metadata queries that run against different types of data sources.
- View Database Schemas
- Explore a Database Schema
- Explore a Directory of SAS Files
- Explore an HTTP Endpoint
- Explore a Directory of CSV Files
View Database Schemas
The query below sends a metadata query to a MySQL database to return a list of the schemas that are available:
PREFIX s: <http://cambridgesemantics.com/ontologies/DataToolkit#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX anzo: <http://openanzo.org/ontologies/2008/07/Anzo#> PREFIX zowl: <http://openanzo.org/ontologies/2009/05/AnzoOwl#> PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT * WHERE { SERVICE <http://cambridgesemantics.com/services/DataToolkit> { [] s:select ?metadata . ?data a s:DbSource ; s:url "jdbc:mysql://10.100.2.9:5555/?user=root&password=Mysql1@#" . ?metadata a s:MetadataSource ; s:from ?data ; ?catalogs [ ?catalog xsd:string ; ?order xsd:int ; ] . } } ORDER BY ?catalog
The query returns the following results:
catalog | order ----------------------+------- BANKTEST_DB | 1 EMR | 4 GOLFCLUB_DB | 8 NORTHWIND | 10 SPORTDB | 13 SQLPOCKET_DB | 14 WORDPRESS_DB | 16 classicmodels | 2 crm_national_patients | 3 emrdbbig | 5 emrdbsmall | 6 emrnational_schema | 7 mysql | 9 optum | 11 performance_schema | 12 sys | 15 16 rows
Explore a Database Schema
Using the list of schemas that were returned in the example above (View Database Schemas), the query below returns metadata about the columns in one of the schemas. To narrow the results to a schema, the schema name (NORTHWIND) is added to the connection URL.
PREFIX s: <http://cambridgesemantics.com/ontologies/DataToolkit#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX anzo: <http://openanzo.org/ontologies/2008/07/Anzo#> PREFIX zowl: <http://openanzo.org/ontologies/2009/05/AnzoOwl#> PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT * WHERE { SERVICE <http://cambridgesemantics.com/services/DataToolkit> { [] s:select ?metadata . ?data a s:DbSource ; s:url "jdbc:mysql://10.100.2.9:5555/NORTHWIND?user=root&password=Mysql1@#" . ?metadata a s:MetadataSource ; s:from ?data ; ?fields [ ?model xsd:string ; ?field xsd:string ; ?datatype owl:Thing ; ] . } } ORDER BY ?model
The query returns the following results:
model | field | datatype -------------------------------+----------------------+--------------------------------------------- Alphabetical list of products | CategoryID | http://www.w3.org/2001/XMLSchema#int Alphabetical list of products | Discontinued | http://www.w3.org/2001/XMLSchema#boolean Alphabetical list of products | SupplierID | http://www.w3.org/2001/XMLSchema#int Alphabetical list of products | UnitPrice | http://www.w3.org/2001/XMLSchema#decimal Alphabetical list of products | ProductName | http://www.w3.org/2001/XMLSchema#string Alphabetical list of products | QuantityPerUnit | http://www.w3.org/2001/XMLSchema#string Alphabetical list of products | UnitsOnOrder | http://www.w3.org/2001/XMLSchema#short Alphabetical list of products | CategoryName | http://www.w3.org/2001/XMLSchema#string Alphabetical list of products | ProductID | http://www.w3.org/2001/XMLSchema#int Alphabetical list of products | ReorderLevel | http://www.w3.org/2001/XMLSchema#short Alphabetical list of products | UnitsInStock | http://www.w3.org/2001/XMLSchema#short Categories | CategoryID | http://www.w3.org/2001/XMLSchema#int Categories | Description | http://www.w3.org/2001/XMLSchema#string Categories | Picture | http://www.w3.org/2001/XMLSchema#base64Binary Categories | CategoryName | http://www.w3.org/2001/XMLSchema#string Categories | categoryid | Category Sales for 1997 | CategoryName | http://www.w3.org/2001/XMLSchema#string Category Sales for 1997 | CategorySales | http://www.w3.org/2001/XMLSchema#double Current Product List | ProductName | http://www.w3.org/2001/XMLSchema#string Current Product List | ProductID | http://www.w3.org/2001/XMLSchema#int ... 201 rows
Explore a Directory of SAS Files
The query below explores a directory of SAS files to return the model, catalog (schema), field, data type, and cardinality information. The query also orders the results by model name, which is the file name for file sources of a data model does not exist.
PREFIX s: <http://cambridgesemantics.com/ontologies/DataToolkit#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX anzo: <http://openanzo.org/ontologies/2008/07/Anzo#> PREFIX zowl: <http://openanzo.org/ontologies/2009/05/AnzoOwl#> PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT * WHERE { SERVICE <http://cambridgesemantics.com/services/DataToolkit> { [] s:select ?metadata . ?data a s:FileSource ; s:url "/opt/shared-files/sas" . ?metadata a s:MetadataSource ; s:from ?data ; ?fields [ ?model xsd:string ; ?field xsd:string ; ?catalog xsd:string ; ?datatype owl:Thing ; ?cardinality xsd:string ; ] . } } ORDER BY ?model
The query returns the following results:
model | field | catalog | datatype | cardinality --------+-------------+---------+-----------------------------------------+------------- demand | P1 | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED demand | P2 | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED demand | P3 | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED demand | Y | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED demand | Q1 | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED demand | Q2 | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED demand | Q3 | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED demo | YEAR | les/sas | http://www.w3.org/2001/XMLSchema#long | REQUIRED demo | QTR | les/sas | http://www.w3.org/2001/XMLSchema#long | REQUIRED demo | GDP | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED demo | PR | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED demo | M1 | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED demo | RS | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED airline | YEAR | les/sas | http://www.w3.org/2001/XMLSchema#long | REQUIRED airline | Y | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED airline | W | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED airline | R | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED airline | L | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED airline | K | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED cars | MPG | les/sas | http://www.w3.org/2001/XMLSchema#long | REQUIRED cars | CYL | les/sas | http://www.w3.org/2001/XMLSchema#long | REQUIRED ... 50 rows
Explore an HTTP Endpoint
The query below explores the metadata for an HTTP endpoint. The query runs against the Dark Sky API, which compiles worldwide weather data.
PREFIX s: <http://cambridgesemantics.com/ontologies/DataToolkit#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX anzo: <http://openanzo.org/ontologies/2008/07/Anzo#> PREFIX zowl: <http://openanzo.org/ontologies/2009/05/AnzoOwl#> PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT * WHERE { SERVICE <http://cambridgesemantics.com/services/DataToolkit> { [] s:select ?metadata . ?data a s:HttpSource ; s:url "https://api.darksky.net/forecast/bdbe3f638eb908c9b94919537dad5945/30.374563,-97.975892" . ?metadata a s:MetadataSource ; s:from ?data ; ?fields [ ?model xsd:string ; ?field xsd:string ; ?datatype owl:Thing ; ?cardinality xsd:string ; ?order xsd:int ; ] . } } ORDER BY ?model ?order
The query returns the following results:
model | field | datatype | cardinality | order ----------+-----------------------------+-----------------------------------------+-------------+------- currently | time | http://www.w3.org/2001/XMLSchema#int | REQUIRED | 6 currently | summary | http://www.w3.org/2001/XMLSchema#string | REQUIRED | 7 currently | icon | http://www.w3.org/2001/XMLSchema#string | REQUIRED | 8 currently | nearestStormDistance | http://www.w3.org/2001/XMLSchema#int | REQUIRED | 9 currently | nearestStormBearing | http://www.w3.org/2001/XMLSchema#int | REQUIRED | 10 currently | precipIntensity | http://www.w3.org/2001/XMLSchema#int | REQUIRED | 11 currently | precipProbability | http://www.w3.org/2001/XMLSchema#int | REQUIRED | 12 currently | temperature | http://www.w3.org/2001/XMLSchema#float | REQUIRED | 13 currently | apparentTemperature | http://www.w3.org/2001/XMLSchema#float | REQUIRED | 14 currently | dewPoint | http://www.w3.org/2001/XMLSchema#float | REQUIRED | 15 currently | humidity | http://www.w3.org/2001/XMLSchema#float | REQUIRED | 16 currently | pressure | http://www.w3.org/2001/XMLSchema#float | REQUIRED | 17 currently | windSpeed | http://www.w3.org/2001/XMLSchema#float | REQUIRED | 18 currently | windGust | http://www.w3.org/2001/XMLSchema#float | REQUIRED | 19 currently | windBearing | http://www.w3.org/2001/XMLSchema#int | REQUIRED | 20 currently | cloudCover | http://www.w3.org/2001/XMLSchema#float | REQUIRED | 21 currently | uvIndex | http://www.w3.org/2001/XMLSchema#int | REQUIRED | 22 currently | visibility | http://www.w3.org/2001/XMLSchema#int | REQUIRED | 23 currently | ozone | http://www.w3.org/2001/XMLSchema#float | REQUIRED | 24 daily | summary | http://www.w3.org/2001/XMLSchema#string | REQUIRED | 75 daily | icon | http://www.w3.org/2001/XMLSchema#string | REQUIRED | 76 daily | data | | MANY | 77 data | time | http://www.w3.org/2001/XMLSchema#int | REQUIRED | 29 data | precipIntensity | http://www.w3.org/2001/XMLSchema#float | REQUIRED | 30 data | precipProbability | http://www.w3.org/2001/XMLSchema#float | REQUIRED | 31 data | summary | http://www.w3.org/2001/XMLSchema#string | OPTIONAL | 32 ... 81 rows
The following query retrieves the model, field, and data type metadata for the United States from the publicly available Data API Covid Tracking Project.
PREFIX s: <http://cambridgesemantics.com/ontologies/DataToolkit#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX anzo: <http://openanzo.org/ontologies/2008/07/Anzo#> PREFIX zowl: <http://openanzo.org/ontologies/2009/05/AnzoOwl#> PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT * WHERE { SERVICE <http://cambridgesemantics.com/services/DataToolkit> { [] s:select ?metadata . ?data a s:HttpSource ; s:url "https://covidtracking.com/api/v1/us/current.csv" . ?metadata a s:MetadataSource ; s:from ?data ; ?fields [ ?model xsd:string ; ?field xsd:string ; ?datatype owl:Thing ; ] . } }
The query returns the following results:
model | field | datatype ------+--------------------------+----------------------------------------- us | date | http://www.w3.org/2001/XMLSchema#string us | states | http://www.w3.org/2001/XMLSchema#string us | positive | http://www.w3.org/2001/XMLSchema#string us | negative | http://www.w3.org/2001/XMLSchema#string us | pending | http://www.w3.org/2001/XMLSchema#string us | hospitalizedCurrently | http://www.w3.org/2001/XMLSchema#string us | hospitalizedCumulative | http://www.w3.org/2001/XMLSchema#string us | inIcuCurrently | http://www.w3.org/2001/XMLSchema#string us | inIcuCumulative | http://www.w3.org/2001/XMLSchema#string us | onVentilatorCurrently | http://www.w3.org/2001/XMLSchema#string us | onVentilatorCumulative | http://www.w3.org/2001/XMLSchema#string us | recovered | http://www.w3.org/2001/XMLSchema#string us | dateChecked | http://www.w3.org/2001/XMLSchema#string us | death | http://www.w3.org/2001/XMLSchema#string us | hospitalized | http://www.w3.org/2001/XMLSchema#string us | lastModified | http://www.w3.org/2001/XMLSchema#string us | total | http://www.w3.org/2001/XMLSchema#string us | totalTestResults | http://www.w3.org/2001/XMLSchema#string us | posNeg | http://www.w3.org/2001/XMLSchema#string us | deathIncrease | http://www.w3.org/2001/XMLSchema#string us | hospitalizedIncrease | http://www.w3.org/2001/XMLSchema#string us | negativeIncrease | http://www.w3.org/2001/XMLSchema#string us | positiveIncrease | http://www.w3.org/2001/XMLSchema#string us | totalTestResultsIncrease | http://www.w3.org/2001/XMLSchema#string us | hash | http://www.w3.org/2001/XMLSchema#string 25 rows
Explore a Directory of CSV Files
The query below explores a directory of CSV files to return the model, field, and data type. The query also orders the results by model name, which is the file name for file sources of a data model does not exist. In addition, the query includes s:sampling true
, which means the GDI will scan the entire file or files before returning results.
PREFIX s: <http://cambridgesemantics.com/ontologies/DataToolkit#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX anzo: <http://openanzo.org/ontologies/2008/07/Anzo#> PREFIX zowl: <http://openanzo.org/ontologies/2009/05/AnzoOwl#> PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT * WHERE { SERVICE <http://cambridgesemantics.com/services/DataToolkit> { [] s:select ?metadata . ?data a s:FileSource ; s:url "/opt/shared-files/movie-csv" . ?metadata a s:MetadataSource ; s:from ?data ; # Sample the whole file s:sampling true ; # Sample the first N records # # s:sampling 1000 ; ?fields [ ?model xsd:string ; ?field xsd:string ; ?datatype owl:Thing ; ] . } } ORDER BY ?model
The query returns the following results:
model | field | datatype -----------------------+---------------------------+----------------------------------------- MovieActors1 | MovieID | http://www.w3.org/2001/XMLSchema#int MovieActors1 | MovieTitle | http://www.w3.org/2001/XMLSchema#string MovieActors1 | ActorID | http://www.w3.org/2001/XMLSchema#int MovieActors1 | ActorName | http://www.w3.org/2001/XMLSchema#string MovieActors2 | MovieID | http://www.w3.org/2001/XMLSchema#int MovieActors2 | MovieTitle | http://www.w3.org/2001/XMLSchema#string MovieActors2 | ActorID | http://www.w3.org/2001/XMLSchema#int MovieActors2 | ActorName | http://www.w3.org/2001/XMLSchema#string MovieActors2 | ActorCategory | http://www.w3.org/2001/XMLSchema#string MovieCategory | MovieID | http://www.w3.org/2001/XMLSchema#int MovieCategory | MovieTitle | http://www.w3.org/2001/XMLSchema#string MovieCategory | MoveCategoryID | http://www.w3.org/2001/XMLSchema#int MovieCategory | MovieCategory | http://www.w3.org/2001/XMLSchema#string MovieCinematographers | MovieID | http://www.w3.org/2001/XMLSchema#int MovieCinematographers | MovieTitle | http://www.w3.org/2001/XMLSchema#string MovieCinematographers | MovieCinematographerID | http://www.w3.org/2001/XMLSchema#int MovieCinematographers | MovieCinematographerName | http://www.w3.org/2001/XMLSchema#string MovieComposers | MovieID | http://www.w3.org/2001/XMLSchema#int MovieComposers | MovieTitle | http://www.w3.org/2001/XMLSchema#string MovieComposers | MovieComposerID | http://www.w3.org/2001/XMLSchema#int MovieComposers | MovieComposerName | http://www.w3.org/2001/XMLSchema#string MovieDirectors | MovieID | http://www.w3.org/2001/XMLSchema#int MovieDirectors | MovieTitle | http://www.w3.org/2001/XMLSchema#string ... 79 rows
For instructions on querying the instance data based on the data source metadata, see Reading or Ingesting Instance Data.