Reading Data Source Metadata
If you want to retrieve instance data from a source but are unsure about the data model, schema, or the exact names of columns and their data types, you can use the Graph Data Interface (GDI) to explore the source's metadata. The GDI can be used to return a list of the catalogs (schemas), models, columns, data types, and other data source information.
This topic describes the metadata query syntax and provides several example queries.
Metadata Query Syntax
The following query syntax shows the structure of a metadata query. The clauses, patterns, and placeholders in blue are described below.
# PREFIX Clause PREFIX s: <http://cambridgesemantics.com/ontologies/DataToolkit#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX anzo: <http://openanzo.org/ontologies/2008/07/Anzo#> PREFIX zowl: <http://openanzo.org/ontologies/2009/05/AnzoOwl#> PREFIX dc: <http://purl.org/dc/elements/1.1/> # Result Clause SELECT * WHERE { # SERVICE Clause: Include the following service call SERVICE <http://cambridgesemantics.com/services/DataToolkit> { [] s:select ?metadata . ?data a s:source_type ; s:url "string" ; [ s:username "string" ; ] [ s:password "string" ; ] ?metadata a s:MetadataSource ; s:from ?data ; # The metadata selector below specifies the type of metadata to return. ?catalogs | ?fields | ?models [ ?metadata_type datatype ; ... ; ] . } }
Option | Data Type | Description |
---|---|---|
PREFIX Clause | N/A | The PREFIX clause declares the standard and custom prefixes for GDI queries. Generally, queries include the following prefixes (or a subset of them) plus any data-specific declarations:PREFIX s: <http://cambridgesemantics.com/ontologies/DataToolkit#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX anzo: <http://openanzo.org/ontologies/2008/07/Anzo#> PREFIX zowl: <http://openanzo.org/ontologies/2009/05/AnzoOwl#> PREFIX dc: <http://purl.org/dc/elements/1.1/> |
Result Clause | N/A | The result clause defines the results to return. For metadata queries, the result clause is typically SELECT * . |
SERVICE Clause |
|
Include the required GDI SERVICE call in the WHERE clause. The rest of the WHERE clause defines the patterns to look for in the source. |
[] s:select ?metadata | N/A | Include this required triple pattern in metadata queries. The select property specifies the source that should be used to return data. |
source_type | object | The ?data a s:source_type triple pattern specifies the type of data source that the query will run against. For example, ?data a s:DbSource , specifies that the source type is a database. The list below describes the available types:
Certain connection and input parameters are available based on the specified source type. For details about the options for your source, see Source-Specific Properties. |
url | string | This property specifies the URL for the data source, such as the database URL, Elasticsearch URL, or HTTP endpoint URL. For file-based sources, the url property specifies the file system location of the source file or directory of files.For security, it is a best practice to reference connection information (such as the url, username, and password) from a Query Context so that the sensitive details are abstracted from any requests. In addition, using a Query Context makes connection details reusable across queries. See Using a Query Context for more information. For example, the triple patterns below reference keys from a Query Context: ?data a s:DbSource ; s:url "{{@Somedb.url}}" ; s:username "{{@Somedb.user}}" ; s:password "{{@Somedb.password}}" ; |
username | string |
If authentication is required to access the source, include this property to specify the user name.
|
password | string |
This property lists the password for the given username.
|
catalogs | variable | This selector narrows the results to schema-related metadata such as the schema names. Even when additional metadata types (metadata_type datatype) are specified as objects, only catalog (schema) information is returned. |
fields | variable | This selector is the broadest and most flexible option. Using the fields selector enables users to return any and all of the source metadata information, depending on the specified metadata types (metadata_type datatype). |
models | variable | This selector narrows the results to model-related metadata such as the model names. Even when additional metadata types (metadata_type datatype) are specified as objects, only model information is returned. |
metadata_type datatype | N/A | The triple patterns in the array for the metadata selector specify the type of metadata to return as well as the data type for the return value. The following list shows all of the valid options. You can include any combination of properties. The results that are returned depend on the type of data source and whether the information exists in the source. The parentheses around the data type are not required but are included in this document for readability.
|
Metadata Query Examples
This section includes sample metadata queries that run against different types of data sources.
- List Database Schemas
- Explore a Database Schema
- Explore a Directory of SAS Files
- Explore an HTTP Endpoint
- Explore a Directory of CSV Files
List Database Schemas
The query below sends a metadata query to a MySQL database to return a list of the schemas that are available:
PREFIX s: <http://cambridgesemantics.com/ontologies/DataToolkit#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX anzo: <http://openanzo.org/ontologies/2008/07/Anzo#> PREFIX zowl: <http://openanzo.org/ontologies/2009/05/AnzoOwl#> PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT * WHERE { SERVICE <http://cambridgesemantics.com/services/DataToolkit> { [] s:select ?metadata . ?data a s:DbSource ; s:url "jdbc:mysql://10.100.2.9:5555/?user=root&password=Mysql1@#" . ?metadata a s:MetadataSource ; s:from ?data ; ?catalogs [ ?catalog (xsd:string) ; ?order (xsd:int) ; ] . } } ORDER BY ?catalog
The query returns the following results:
catalog | order ----------------------+------- BANKTEST_DB | 1 EMR | 4 GOLFCLUB_DB | 8 NORTHWIND | 10 SPORTDB | 13 SQLPOCKET_DB | 14 WORDPRESS_DB | 16 classicmodels | 2 crm_national_patients | 3 emrdbbig | 5 emrdbsmall | 6 emrnational_schema | 7 mysql | 9 optum | 11 performance_schema | 12 sys | 15 16 rows
Explore a Database Schema
Using the list of schemas that were returned in the example above (List Database Schemas), the query below returns metadata about the columns in one of the schemas. To narrow the results to a schema, the schema name (NORTHWIND) is added to the connection URL.
PREFIX s: <http://cambridgesemantics.com/ontologies/DataToolkit#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX anzo: <http://openanzo.org/ontologies/2008/07/Anzo#> PREFIX zowl: <http://openanzo.org/ontologies/2009/05/AnzoOwl#> PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT * WHERE { SERVICE <http://cambridgesemantics.com/services/DataToolkit> { [] s:select ?metadata . ?data a s:DbSource ; s:url "jdbc:mysql://10.100.2.9:5555/NORTHWIND?user=root&password=Mysql1@#" . ?metadata a s:MetadataSource ; s:from ?data ; ?fields [ ?model (xsd:string) ; ?field (xsd:string) ; ?datatype (owl:Thing) ; ] . } } ORDER BY ?model
The query returns the following results:
model | field | datatype -------------------------------+----------------------+--------------------------------------------- Alphabetical list of products | CategoryID | http://www.w3.org/2001/XMLSchema#int Alphabetical list of products | Discontinued | http://www.w3.org/2001/XMLSchema#boolean Alphabetical list of products | SupplierID | http://www.w3.org/2001/XMLSchema#int Alphabetical list of products | UnitPrice | http://www.w3.org/2001/XMLSchema#decimal Alphabetical list of products | ProductName | http://www.w3.org/2001/XMLSchema#string Alphabetical list of products | QuantityPerUnit | http://www.w3.org/2001/XMLSchema#string Alphabetical list of products | UnitsOnOrder | http://www.w3.org/2001/XMLSchema#short Alphabetical list of products | CategoryName | http://www.w3.org/2001/XMLSchema#string Alphabetical list of products | ProductID | http://www.w3.org/2001/XMLSchema#int Alphabetical list of products | ReorderLevel | http://www.w3.org/2001/XMLSchema#short Alphabetical list of products | UnitsInStock | http://www.w3.org/2001/XMLSchema#short Categories | CategoryID | http://www.w3.org/2001/XMLSchema#int Categories | Description | http://www.w3.org/2001/XMLSchema#string Categories | Picture | http://www.w3.org/2001/XMLSchema#base64Binary Categories | CategoryName | http://www.w3.org/2001/XMLSchema#string Categories | categoryid | Category Sales for 1997 | CategoryName | http://www.w3.org/2001/XMLSchema#string Category Sales for 1997 | CategorySales | http://www.w3.org/2001/XMLSchema#double Current Product List | ProductName | http://www.w3.org/2001/XMLSchema#string Current Product List | ProductID | http://www.w3.org/2001/XMLSchema#int ... 201 rows
Explore a Directory of SAS Files
The query below explores a directory of SAS files to return the model, catalog (schema), field, data type, and cardinality information. The query also orders the results by model name, which is the file name for file sources of a data model does not exist.
PREFIX s: <http://cambridgesemantics.com/ontologies/DataToolkit#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX anzo: <http://openanzo.org/ontologies/2008/07/Anzo#> PREFIX zowl: <http://openanzo.org/ontologies/2009/05/AnzoOwl#> PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT * WHERE { SERVICE <http://cambridgesemantics.com/services/DataToolkit> { [] s:select ?metadata . ?data a s:FileSource ; s:url "/opt/shared-files/sas" . ?metadata a s:MetadataSource ; s:from ?data ; ?fields [ ?model (xsd:string) ; ?field (xsd:string) ; ?catalog (xsd:string) ; ?datatype (owl:Thing) ; ?cardinality (xsd:string) ; ] . } } ORDER BY ?model
The query returns the following results:
model | field | catalog | datatype | cardinality --------+-------------+---------+-----------------------------------------+------------- demand | P1 | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED demand | P2 | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED demand | P3 | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED demand | Y | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED demand | Q1 | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED demand | Q2 | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED demand | Q3 | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED demo | YEAR | les/sas | http://www.w3.org/2001/XMLSchema#long | REQUIRED demo | QTR | les/sas | http://www.w3.org/2001/XMLSchema#long | REQUIRED demo | GDP | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED demo | PR | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED demo | M1 | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED demo | RS | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED airline | YEAR | les/sas | http://www.w3.org/2001/XMLSchema#long | REQUIRED airline | Y | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED airline | W | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED airline | R | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED airline | L | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED airline | K | les/sas | http://www.w3.org/2001/XMLSchema#double | REQUIRED cars | MPG | les/sas | http://www.w3.org/2001/XMLSchema#long | REQUIRED cars | CYL | les/sas | http://www.w3.org/2001/XMLSchema#long | REQUIRED ... 50 rows
Explore an HTTP Endpoint
The query below explores the metadata for a sample HTTP source that compiles worldwide weather statistics.
PREFIX s: <http://cambridgesemantics.com/ontologies/DataToolkit#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX anzo: <http://openanzo.org/ontologies/2008/07/Anzo#> PREFIX zowl: <http://openanzo.org/ontologies/2009/05/AnzoOwl#> PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT * WHERE { SERVICE <http://cambridgesemantics.com/services/DataToolkit> { [] s:select ?metadata . ?data a s:HttpSource ; s:url "https://sampleEndpoint.com/forecast/30.374563,-97.975892" . ?metadata a s:MetadataSource ; s:from ?data ; ?fields [ ?model (xsd:string) ; ?field (xsd:string) ; ?datatype (owl:Thing) ; ?cardinality (xsd:string) ; ?order (xsd:int) ; ] . } } ORDER BY ?model ?order
The query returns the following results:
model | field | datatype | cardinality | order ----------+-----------------------------+-----------------------------------------+-------------+------- currently | time | http://www.w3.org/2001/XMLSchema#int | REQUIRED | 6 currently | summary | http://www.w3.org/2001/XMLSchema#string | REQUIRED | 7 currently | icon | http://www.w3.org/2001/XMLSchema#string | REQUIRED | 8 currently | nearestStormDistance | http://www.w3.org/2001/XMLSchema#int | REQUIRED | 9 currently | nearestStormBearing | http://www.w3.org/2001/XMLSchema#int | REQUIRED | 10 currently | precipIntensity | http://www.w3.org/2001/XMLSchema#int | REQUIRED | 11 currently | precipProbability | http://www.w3.org/2001/XMLSchema#int | REQUIRED | 12 currently | temperature | http://www.w3.org/2001/XMLSchema#float | REQUIRED | 13 currently | apparentTemperature | http://www.w3.org/2001/XMLSchema#float | REQUIRED | 14 currently | dewPoint | http://www.w3.org/2001/XMLSchema#float | REQUIRED | 15 currently | humidity | http://www.w3.org/2001/XMLSchema#float | REQUIRED | 16 currently | pressure | http://www.w3.org/2001/XMLSchema#float | REQUIRED | 17 currently | windSpeed | http://www.w3.org/2001/XMLSchema#float | REQUIRED | 18 currently | windGust | http://www.w3.org/2001/XMLSchema#float | REQUIRED | 19 currently | windBearing | http://www.w3.org/2001/XMLSchema#int | REQUIRED | 20 currently | cloudCover | http://www.w3.org/2001/XMLSchema#float | REQUIRED | 21 currently | uvIndex | http://www.w3.org/2001/XMLSchema#int | REQUIRED | 22 currently | visibility | http://www.w3.org/2001/XMLSchema#int | REQUIRED | 23 currently | ozone | http://www.w3.org/2001/XMLSchema#float | REQUIRED | 24 daily | summary | http://www.w3.org/2001/XMLSchema#string | REQUIRED | 75 daily | icon | http://www.w3.org/2001/XMLSchema#string | REQUIRED | 76 daily | data | | MANY | 77 data | time | http://www.w3.org/2001/XMLSchema#int | REQUIRED | 29 data | precipIntensity | http://www.w3.org/2001/XMLSchema#float | REQUIRED | 30 data | precipProbability | http://www.w3.org/2001/XMLSchema#float | REQUIRED | 31 data | summary | http://www.w3.org/2001/XMLSchema#string | OPTIONAL | 32 ... 81 rows
The following query retrieves the model, field, and data type metadata for the United States from the publicly available Data API Covid Tracking Project.
PREFIX s: <http://cambridgesemantics.com/ontologies/DataToolkit#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX anzo: <http://openanzo.org/ontologies/2008/07/Anzo#> PREFIX zowl: <http://openanzo.org/ontologies/2009/05/AnzoOwl#> PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT * WHERE { SERVICE <http://cambridgesemantics.com/services/DataToolkit> { [] s:select ?metadata . ?data a s:HttpSource ; s:url "https://covidtracking.com/api/v1/us/current.csv" . ?metadata a s:MetadataSource ; s:from ?data ; ?fields [ ?model (xsd:string) ; ?field (xsd:string) ; ?datatype (owl:Thing) ; ] . } }
The query returns the following results:
model | field | datatype ------+--------------------------+----------------------------------------- us | date | http://www.w3.org/2001/XMLSchema#string us | states | http://www.w3.org/2001/XMLSchema#string us | positive | http://www.w3.org/2001/XMLSchema#string us | negative | http://www.w3.org/2001/XMLSchema#string us | pending | http://www.w3.org/2001/XMLSchema#string us | hospitalizedCurrently | http://www.w3.org/2001/XMLSchema#string us | hospitalizedCumulative | http://www.w3.org/2001/XMLSchema#string us | inIcuCurrently | http://www.w3.org/2001/XMLSchema#string us | inIcuCumulative | http://www.w3.org/2001/XMLSchema#string us | onVentilatorCurrently | http://www.w3.org/2001/XMLSchema#string us | onVentilatorCumulative | http://www.w3.org/2001/XMLSchema#string us | recovered | http://www.w3.org/2001/XMLSchema#string us | dateChecked | http://www.w3.org/2001/XMLSchema#string us | death | http://www.w3.org/2001/XMLSchema#string us | hospitalized | http://www.w3.org/2001/XMLSchema#string us | lastModified | http://www.w3.org/2001/XMLSchema#string us | total | http://www.w3.org/2001/XMLSchema#string us | totalTestResults | http://www.w3.org/2001/XMLSchema#string us | posNeg | http://www.w3.org/2001/XMLSchema#string us | deathIncrease | http://www.w3.org/2001/XMLSchema#string us | hospitalizedIncrease | http://www.w3.org/2001/XMLSchema#string us | negativeIncrease | http://www.w3.org/2001/XMLSchema#string us | positiveIncrease | http://www.w3.org/2001/XMLSchema#string us | totalTestResultsIncrease | http://www.w3.org/2001/XMLSchema#string us | hash | http://www.w3.org/2001/XMLSchema#string 25 rows
Explore a Directory of CSV Files
The query below explores a directory of CSV files to return the model, field, and data type. The query also orders the results by model name, which is the file name for file sources of a data model does not exist. In addition, the query includes s:sampling true
, which means the GDI will scan the entire file or files before returning results.
PREFIX s: <http://cambridgesemantics.com/ontologies/DataToolkit#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX anzo: <http://openanzo.org/ontologies/2008/07/Anzo#> PREFIX zowl: <http://openanzo.org/ontologies/2009/05/AnzoOwl#> PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT * WHERE { SERVICE <http://cambridgesemantics.com/services/DataToolkit> { [] s:select ?metadata . ?data a s:FileSource ; s:url "/opt/shared-files/movie-csv" . ?metadata a s:MetadataSource ; s:from ?data ; # Sample the whole file s:sampling true ; # Sample the first N records # # s:sampling 1000 ; ?fields [ ?model (xsd:string) ; ?field (xsd:string) ; ?datatype (owl:Thing) ; ] . } } ORDER BY ?model
The query returns the following results:
model | field | datatype -----------------------+---------------------------+----------------------------------------- MovieActors1 | MovieID | http://www.w3.org/2001/XMLSchema#int MovieActors1 | MovieTitle | http://www.w3.org/2001/XMLSchema#string MovieActors1 | ActorID | http://www.w3.org/2001/XMLSchema#int MovieActors1 | ActorName | http://www.w3.org/2001/XMLSchema#string MovieActors2 | MovieID | http://www.w3.org/2001/XMLSchema#int MovieActors2 | MovieTitle | http://www.w3.org/2001/XMLSchema#string MovieActors2 | ActorID | http://www.w3.org/2001/XMLSchema#int MovieActors2 | ActorName | http://www.w3.org/2001/XMLSchema#string MovieActors2 | ActorCategory | http://www.w3.org/2001/XMLSchema#string MovieCategory | MovieID | http://www.w3.org/2001/XMLSchema#int MovieCategory | MovieTitle | http://www.w3.org/2001/XMLSchema#string MovieCategory | MoveCategoryID | http://www.w3.org/2001/XMLSchema#int MovieCategory | MovieCategory | http://www.w3.org/2001/XMLSchema#string MovieCinematographers | MovieID | http://www.w3.org/2001/XMLSchema#int MovieCinematographers | MovieTitle | http://www.w3.org/2001/XMLSchema#string MovieCinematographers | MovieCinematographerID | http://www.w3.org/2001/XMLSchema#int MovieCinematographers | MovieCinematographerName | http://www.w3.org/2001/XMLSchema#string MovieComposers | MovieID | http://www.w3.org/2001/XMLSchema#int MovieComposers | MovieTitle | http://www.w3.org/2001/XMLSchema#string MovieComposers | MovieComposerID | http://www.w3.org/2001/XMLSchema#int MovieComposers | MovieComposerName | http://www.w3.org/2001/XMLSchema#string MovieDirectors | MovieID | http://www.w3.org/2001/XMLSchema#int MovieDirectors | MovieTitle | http://www.w3.org/2001/XMLSchema#string ... 79 rows
The following example shows a query that returns metadata for an Elasticsearch source.
PREFIX s: <http://cambridgesemantics.com/ontologies/DataToolkit#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX anzo: <http://openanzo.org/ontologies/2008/07/Anzo#> PREFIX zowl: <http://openanzo.org/ontologies/2009/05/AnzoOwl#> PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX ex: <http://example.org/ontologies/City#> PREFIX es: <http://elastic.co/search/> PREFIX : <http://example.org/cities/> SELECT * WHERE { SERVICE <http://cambridgesemantics.com/services/DataToolkit> { [] s:select ?_fields . ?data a es:ElasticSource ; es:url "http://localhost:9200/" ; es:index "account" ; ?account_number xsd:long ; ?age (xsd:long) ; ?balance (xsd:long) ; ?address (xsd:string) ; ?city (xsd:string) ; ?state (xsd:string) ; ?email (xsd:string) ; ?employer (xsd:string) ; ?firstname (xsd:string) ; ?lastname (xsd:string) ; ?gender (xsd:string) . ?_fields a s:MetadataSource ; s:from ?data ; ?fields [ ?catalog () ; ?model () ; ?field () ; ?cardinality () ; ?datatype () ; ?type () ; ?object () ; ] . } } ORDER BY ?catalog ?model ?field
For instructions on querying the instance data based on the data source metadata, see Getting Started with GDI Queries.