Reading or Ingesting Instance Data
Depending on the type of SPARQL query that you write, the Graph Data Interface (GDI) service can be used to ingest instance data through an extract, load, and transform process or it can be used to analyze source data without updating the database. When you query a source such as a database, the GDI service leverages that source to retrieve only the data that it needs for the query. Unlike a JDBC driver, the GDI service does not need to retrieve all values and then complete an often time-consuming step to filter the results.
This topic provides details about the syntax to use when writing GDI queries and includes examples that demonstrate the data integration capabilities for different types of data sources.
- Graph Data Interface Query Syntax
- Deep-Dive into Hierarchical Bindings
- Graph Data Interface Query Examples
Graph Data Interface Query Syntax
The following query syntax shows the structure of a GDI query. The table below the query describes the parts of the query and defines the placeholder elements in bold.
PREFIX s: <http://cambridgesemantics.com/ontologies/DataToolkit#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX anzo: <http://openanzo.org/ontologies/2008/07/Anzo#> PREFIX zowl: <http://openanzo.org/ontologies/2009/05/AnzoOwl#> PREFIX dc: <http://purl.org/dc/elements/1.1/> # the result clause: the type of query to run # and set of results to return SELECT | CONSTRUCT | INSERT | CREATE [OR REPLACE] [MATERIALIZED] VIEW ... WHERE { SERVICE [TOPDOWN] <http://cambridgesemantics.com/services/DataToolkit> { ?data a s:source_type ; s:connection_parameters ; s:input_parameters ; # the output parameters: variable bindings and transformations, # such as column name or data type conversions ?variable (["binding"] [datatype] ["format"]) ; ... ; . # additional clauses such as VALUES or FILTER ... } }
Clause or Option | Description |
---|---|
PREFIX Clause | The PREFIX clause shown above declares the prefixes that are common or standard for all GDI service queries. You can declare additional prefixes to use in the query. |
Result Clause | The result clause defines the type of SPARQL query to run and the set of results to return. This clause defines whether you want to read (SELECT or CONSTRUCT) from the remote source, ingest the data into AnzoGraph (INSERT or CREATE MATERIALIZED VIEW), or create a "virtual" data source or non-materialized view (CREATE VIEW). The Graph Data Interface Query Examples section below includes more information and examples of each type of query. |
WHERE Clause | The WHERE clause includes the required GDI SERVICE call and defines the data source, input and output parameters, and any additional clauses for ingesting or analyzing the data. The rows below describe the patterns in the WHERE clause. |
SERVICE Call | The following SERVICE call invokes the GDI service:SERVICE [TOPDOWN] <http://cambridgesemantics.com/services/DataToolkit> The optional TOPDOWN keyword is used to pass input values to the data source. When you include TOPDOWN in the service call, it indicates that the rest of the query produces values to send to the source. The GDI service makes repeated calls to pass in each of the specified values and retrieve the data that is based on those values. |
source_type | The ?data a s:source_type triple pattern specifies the type of data source that the query will run against. For example, ?data a s:DbSource . The list below describes the commonly used types:
For a complete list of the supported source types, view the GDI ontology. Each data source type is represented by an owl:Class. See Graph Data Interface Model for more information. |
connection_parameters | The source connection parameters are the values that are required for accessing the source, such the database connection URL, path to a file source, username, password, key, token, etc. The connection details can be referenced from a Query Context or specified in the query. For more information about contexts, see Using Query Contexts. For example, the pattern below specifies the connection details for a database source: ?data a s:DbSource ; s:url "jdbc:postgresql://10.100.2.9:5555/kl_hosp_db? user=postgres&password=postgres123" The example below specifies the connection details for a file-based source, a directory of CSV files: ?data a s:FileSource ; s:url "/opt/shared-files/sales-csv" For a single file, specify the filename in the URL. For example, |
input_parameters | In addition to the source connection parameters, you can include other input parameters based on the data source type. For example, the list below describes the commonly used input parameters:
For a complete list of the supported input parameters by data source, view the GDI ontology. Each data source type is represented by an owl:Class and the related input parameters are properties in the class. See Graph Data Interface Model for more information. |
?variable () | The output parameters, in ?variable (["binding"] [datatype] ["format"]) format, define the triple patterns to output. When the specified ?variable matches the source column name, the GDI uses the variable as the source data selector. If you specify an alternate variable name, then a binding needs to be specified to map the new variable to the source.If you are unsure about the exact column names or other source details, you can run a query that returns metadata information. For details, see Reading Data Source Metadata. In the object position in parenthesis, you also have the option to transform the data using the datatype and format options. The list below describes each option:
|
Deep-Dive into Hierarchical Bindings
As part of the GDI's flexibility, there are multiple ways to express binding hierarchies in GDI queries. One way is to use brackets ( [ ] ) to group triple patterns into binding trees. For example, the query snippet below organizes output parameters into an hourly.data hierarchy:
?data a s:HttpSource; s:url "https://api.darksky.net/forecast/bdbe3f638eb908c9b94919537dad5945/30.374563,-97.975892" ; ?latitude (xsd:double) ; ?longitude (xsd:double) ; ?timezone (xsd:string) ; ?hourly [ s:selector "hourly" ; ?data [ s:selector "data" ; ?time (xsd:long) ; ?summary (xsd:string) ; ?nearestStormDistance (xsd:int) ; ?rainIntensity ("precipIntensity" xsd:double) ; ?rainProbability ("precipProbability" xsd:double) ; ?temperature (xsd:double) ; ?feelsLike ("apparentTemperature" xsd:double) ; ?humidity (xsd:double) ; ?pressure (xsd:double) ; ?windSpeed (xsd:double) ; ] ; ] .
Each level of the binding hierarchy above has a s:selector property that is used to navigate the data. The selector is optional, however. If a selector is not specified, the output triple patterns default to the name of the variable that introduces that level of the hierarchy.
As an alternative to grouping triple patterns in bracketed trees, the s:selector property can specify a path. For example, the snippet below rewrites the example above to express the same hierarchy by specifying a path as the value for s:selector:
?data a s:HttpSource; s:url "https://api.darksky.net/forecast/bdbe3f638eb908c9b94919537dad5945/30.374563,-97.975892" ; ?latitude (xsd:double) ; ?longitude (xsd:double) ; ?timezone (xsd:string) ; ?hourly [ s:selector "hourly.data" ; ?time (xsd:long) ; ?summary (xsd:string) ; ?nearestStormDistance (xsd:int) ; ?rainIntensity ("precipIntensity" xsd:double) ; ?rainProbability ("precipProbability" xsd:double) ; ?temperature (xsd:double) ; ?feelsLike ("apparentTemperature" xsd:double) ; ?humidity (xsd:double) ; ?pressure (xsd:double) ; ?windSpeed (xsd:double) ; ] .
When working with schema-less (or schema-flexible) sources like JSON, you can also capture a tree of data as a JSON string. For example, using the query snippet above, if the properties under hourly were unknown, the snippet could be rewritten as follows. This query would bind all of the data below hourly to the ?hourly variable and return a JSON string representation of the properties and instance data:
?data a s:HttpSource; s:url "https://api.darksky.net/forecast/bdbe3f638eb908c9b94919537dad5945/30.374563,-97.975892" ; ?latitude (xsd:double) ; ?longitude (xsd:double) ; ?timezone (xsd:string) ; ?hourly () .
For example, the results look like this:
latitude | longitude | timezone | hourly -----------+------------+-----------------+---------------------------- 30.374563 | -97.975892 | America/Chicago | {"summary":"\"Humid and partly cloudy throughout the day.\"","icon":"\"partly-cloudy-day\"","data":[{"time":"1595559600", summary":"\"Clear\"","icon":"\"clear-night\"","precipIntensity":"0", "precipProbability":"0","temperature":"88.39","apparentTemperature":"91.72", "dewPoint":"67.42","humidity":"0.5","pressure":"1011.7","windSpeed":"7.48", "windGust":"16.71","windBearing":"109","cloudCover":"0.06","uvIndex":"0", "visibility":"10","ozone":"285.2"},{"time":"1595563200","summary":"\"Clear\"", "icon":"\"clear-night\"","precipIntensity":"2.0E-4","precipProbability":"0.01", "precipType":"\"rain\"","temperature":"86.69","apparentTemperature":"90.1", "dewPoint":"67.84","humidity":"0.54","pressure":"1012","windSpeed":"7.05", "windGust":"17.56","windBearing":"110","cloudCover":"0.12","uvIndex":"0", "visibility":"10","ozone":"284.9"},...
Similar to the example above, you can write a query that specifically captures some of the properties in a hierarchy and then returns the rest of the properties and their values as a JSON string representation. To do so, use @ as the binding path. For example:
?data a s:HttpSource; s:url "https://api.darksky.net/forecast/bdbe3f638eb908c9b94919537dad5945/30.374563,-97.975892" ; ?latitude (xsd:double) ; ?longitude (xsd:double) ; ?timezone (xsd:string) ; ?hourly [ s:selector "hourly.data" ; ?time (xsd:long) ; ?summary (xsd:string) ; ?hourly_data ("@") ; ] .
The results look like this:
latitude | longitude | timezone | time | summary | hourly_data -----------+------------+-----------------+------------+------------------+--------------- 30.374563 | -97.975892 | America/Chicago | 1595559600 | Clear | {"time":"1595559600","summary":"\"Clear\"", "icon":"\"clear-night\"","precipIntensity":"0","precipProbability":"0","temperature":"88.39", "apparentTemperature":"91.72","dewPoint":"67.42","humidity":"0.5","pressure":"1011.7","windSpeed":"7.48", "windGust":"16.71","windBearing":"109","cloudCover":"0.06","uvIndex":"0","visibility":"10","ozone":"285.2"} 30.374563 | -97.975892 | America/Chicago | 1595563200 | Clear | {"time":"1595563200","summary":"\"Clear\"", "icon":"\"clear-night\"","precipIntensity":"2.0E-4","precipProbability":"0.01","precipType":"\"rain\"","temperature":"86.69", "apparentTemperature":"90.1","dewPoint":"67.84","humidity":"0.54","pressure":"1012","windSpeed":"7.05","windGust":"17.56", "windBearing":"110","cloudCover":"0.12","uvIndex":"0","visibility":"10","ozone":"284.9"} 30.374563 | -97.975892 | America/Chicago | 1595566800 | Partly Cloudy | {"time":"1595566800","summary":"\"Partly Cloudy\"", "icon":"\"partly-cloudy-night\"","precipIntensity":"3.0E-4","precipProbability":"0.01", "precipType":"\"rain"","temperature":"85.63","apparentTemperature":"89.21", "dewPoint":"68.33","humidity":"0.56","pressure":"1012.6","windSpeed":"6.48","windGust":"17.92","windBearing":"110", "cloudCover":"0.34","uvIndex":"0","visibility":"10","ozone":"284.5"} ...
Graph Data Interface Query Examples
This section includes several sample queries that demonstrate how to use the Graph Data Interface (GDI) to read or ingest data from different types of data sources.
- Reading Data from Remote Sources
- Ingesting Data from Remote Sources
- Creating Virtual Graphs from Remote Sources
Reading Data from Remote Sources
The examples in this section read data from remote data sources by using the GDI service to run SELECT queries against the sources.
The following query reads Covid statistics for the United States from the publicly available Data API Covid Tracking Project. The query simply retrieves data from the HTTP source without passing values as input to the endpoint:
PREFIX s: <http://cambridgesemantics.com/ontologies/DataToolkit#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX anzo: <http://openanzo.org/ontologies/2008/07/Anzo#> PREFIX zowl: <http://openanzo.org/ontologies/2009/05/AnzoOwl#> PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT * WHERE { SERVICE <http://cambridgesemantics.com/services/DataToolkit> { ?data a s:HttpSource ; s:url "https://covidtracking.com/api/v1/us/current.csv" ; ?positive (xsd:int) ; ?negative (xsd:int) ; ?totalTests ("totalTestResults" xsd:int) ; ?hospitalized ("hospitalizedCurrently" xsd:int) ; ?inICU ("inIcuCurrently" xsd:int) ; ?updated ("date" xsd:date "yyyyMMdd") . } }
The query returns the following results:
positive | negative | totalTests | hospitalized | inICU | updated ---------+----------+------------+--------------+-------+------------ 4017735 | 44777235 | 48794970 | 59846 | 10466 | 2020-07-23 1 rows
The SELECT query below also reads data from an HTTP source, the Dark Sky API, which compiles worldwide weather statistics. The API has several models available for retrieving data that is current, daily, historical, etc. To target current data, the query includes s:selector "currently"
as an input parameter. In addition, the query demonstrates the use of the "topdown" functionality, where the query includes values to be input to the source to narrow the results to specific locations. The query includes the TOPDOWN keyword in the GDI service call, and the VALUES clause specifies the latitude and longitude values for the cities to return data for. In addition, since this API service requires parameters to be specified in the connection URL, the s:url value includes ?lat and ?long as parameters for the value.
PREFIX s: <http://cambridgesemantics.com/ontologies/DataToolkit#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX anzo: <http://openanzo.org/ontologies/2008/07/Anzo#> PREFIX zowl: <http://openanzo.org/ontologies/2009/05/AnzoOwl#> PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX ex: <http://example.org/ontologies/City#> SELECT ?city ?state ?summary ?temp ?rainChance ?humidity ?pressure ?windSpeed WHERE { SERVICE TOPDOWN <http://cambridgesemantics.com/services/DataToolkit> { ?data a s:HttpSource ; s:url "https://api.darksky.net/forecast/bdbe3f638eb908c9b94919537dad5945/{{?lat}},{{?long}}" ; s:selector "currently" ; ?lat ("latitude") ; ?long ("longitude") ; ?time () ; ?summary () ; ?temp ("temperature") ; ?rainChance ( "precipProbability" ) ; ?rainIntensity ( "precipIntensity" ) ; ?humidity () ; ?pressure () ; ?windSpeed () ; ?windGust () ; ?windBearing () ; ?nearestStorm ("nearestStormDistance") ; ?stormBearing ("nearestStormBearing") ; ?visibility () . } VALUES( ?city ?state ?lat ?long ) { ( "Lakeway" "TX" 30.374563 -97.975892 ) ( "Boston" "MA" 42.358043 -71.060415 ) ( "Seattle" "WA" 47.590720 -122.307053 ) ( "Chicago" "IL" 41.837741 -87.823296 ) ( "Hilo" "HI" 19.702040 -155.090312 ) } } ORDER BY ?city
The query returns the following results:
city | state | summary | temp | rainChance | humidity | pressure | windSpeed --------+-------+---------------+-------+------------+----------+----------+----------- Boston | MA | Overcast | 79.81 | 0 | 0.6 | 1018.7 | 7.71 Chicago | IL | Clear | 81.7 | 0 | 0.52 | 1021.1 | 5.13 Hilo | HI | Partly Cloudy | 72.6 | 0.13 | 0.79 | 1018.6 | 4.86 Lakeway | TX | Partly Cloudy | 92.43 | 0 | 0.48 | 1013.3 | 10.85 Seattle | WA | Mostly Cloudy | 61.82 | 0 | 0.76 | 1018.2 | 4.57 5 rows
The following SELECT query reads film data from a database source. The connection parameters reference a context to obscure the sensitive information from the request. For information about contexts, see Using Query Contexts.
PREFIX s: <http://cambridgesemantics.com/ontologies/DataToolkit#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX anzo: <http://openanzo.org/ontologies/2008/07/Anzo#> PREFIX zowl: <http://openanzo.org/ontologies/2009/05/AnzoOwl#> PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT DISTINCT * WHERE { SERVICE <http://cambridgesemantics.com/services/DataToolkit> { ?data a s:DbSource ; s:url "{{@AdventureWorks2012:db.url}}" ; s:username "{{@AdventureWorks2012:db.username}}" ; s:password "{{@AdventureWorks2012:db.password}}" ; s:selector "[dbo].[FILM]" ; ?year ("[YEAR]" xsd:int); ?length (xsd:int) ; ?title (xsd:string) ; ?subject ("[dbo].[FILM].[SUBJECT]" xsd:string) ; ?actor ("[ACTOR]" xsd:string) ; ?actress (xsd:string) ; ?director (xsd:string) ; ?popularity (xsd:int) ; ?awards (xsd:string) ; ?image (xsd:string) . FILTER(?year >= 1990 && ?year < 2000) FILTER(?subject = "Drama" || ?subject = "Action") FILTER(?length <= 90) } }
The example query below reads data from a CSV file source that contains data about airports. The query uses bindings to map the upper case column names in the file to lowercase variables. The result list includes an expression to concatenate the city and state values, and the WHERE clause includes a filter to narrow the results to two states: CA and MA.
PREFIX s: <http://cambridgesemantics.com/ontologies/DataToolkit#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX anzo: <http://openanzo.org/ontologies/2008/07/Anzo#> PREFIX zowl: <http://openanzo.org/ontologies/2009/05/AnzoOwl#> PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT ?code ?name (CONCAT(?city,", ", ?state) as ?location) WHERE { SERVICE <http://cambridgesemantics.com/services/DataToolkit> { ?data a s:FileSource ; s:url "/opt/shared-files/airports.csv" ; ?code ("IATA_CODE") ; ?name ("AIRPORT") ; ?city ("CITY") ; ?state ("STATE") ; ?country ("COUNTRY") ; ?lat ("LATITUDE") ; ?long ("LONGITUDE"). FILTER (?state="CA" || ?state="MA") . } } ORDER BY ?state
The query returns the following results:
code | name | location -----+-----------------------------------------------------------+------------------- ACV | Arcata Airport | Arcata/Eureka, CA BFL | Meadows Field | Bakersfield, CA BUR | Bob Hope Airport (Hollywood Burbank Airport) | Burbank, CA CEC | Del Norte County Airport (Jack McNamara Field) | Crescent City, CA CLD | McClellan-Palomar Airport | San Diego, CA FAT | Fresno Yosemite International Airport | Fresno, CA LAX | Los Angeles International Airport | Los Angeles, CA LGB | Long Beach Airport (Daugherty Field) | Long Beach, CA MMH | Mammoth Yosemite Airport | Mammoth Lakes, CA MRY | Monterey Regional Airport (Monterey Peninsula Airport) | Monterey, CA OAK | Oakland International Airport | Oakland, CA ONT | Ontario International Airport | Ontario, CA PSP | Palm Springs International Airport | Palm Springs, CA RDD | Redding Municipal Airport | Redding, CA SAN | San Diego International Airport (Lindbergh Field) | San Diego, CA SBA | Santa Barbara Municipal Airport (Santa Barbara Airport) | Santa Barbara, CA SBP | San Luis Obispo County Regional Airport (McChesney Field) | San Luis Obispo, CA SFO | San Francisco International Airport | San Francisco, CA SJC | Norman Y. Mineta San José International Airport | San Jose, CA SMF | Sacramento International Airport | Sacramento, CA SMX | Santa Maria Public Airport (Capt G. Allan Hancock Field) | Santa Maria, CA SNA | John Wayne Airport (Orange County Airport) | Santa Ana, CA ACK | Nantucket Memorial Airport | Nantucket, MA BOS | Gen. Edward Lawrence Logan International Airport | Boston, MA HYA | Barnstable Municipal Airport | Hyannis, MA MVY | Martha's Vineyard Airport | Marthas Vineyard, MA ORH | Worcester Regional Airport | Worcester, MA 27 rows
And the example query below reads data from Parquet files.
PREFIX s: <http://cambridgesemantics.com/ontologies/DataToolkit#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX anzo: <http://openanzo.org/ontologies/2008/07/Anzo#> PREFIX zowl: <http://openanzo.org/ontologies/2009/05/AnzoOwl#> PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT * WHERE { SERVICE <http://cambridgesemantics.com/services/DataToolkit> { ?data a s:FileSource ; s:url "/opt/shared-files/data/" ; s:pattern "*.parquet" ; ?bammUid {xsd:string) ; ?eventHeader [ ?eventId (xsd:string) ; ?eventName (xsd:string) ; ?eventVersion (xsd:string) ; ?eventCreatedAt (xsd:string) ; ] ; ?Location [ ?LocationId (xsd:string) ; ?PlantId (xsd:string) ; ?LineNo (xsd:long) ; ?StationNo (xsd:long) ; ?StationIdx (xsd:long) ; ?FunctionNo (xsd:long) ; ?WorkPos (xsd:long) ; ?ToolPos (xsd:long) ; ?ProcessNo (xsd:long) ; ] ; ?ProcessedPanel [ ?aeMachineId (xsd:string) ; ?panelId (xsd:string) ; ?productPartNo (xsd:string) ; ?productPartVariant (xsd:string) ; ?workingCode (xsd:long) ; ?Prog1Name (xsd:string) ; ?TestStartedAt (xsd:string) ; ?NumOfFailures (xsd:long) ; ?PanelProcessResult (xsd:long) ; ?PanelProcessResultBit (xsd:long) ; ?PanelTargetIdx (xsd:long) ; ?PanelState (xsd:long) ; ] ; ?ProcessedBoards [ ?boardNo (xsd:long) ; ?boardId (xsd:string) ; ?BoardProcessResult (xsd:long) ; ?BoardState (xsd:long) ; ] . } }
Ingesting Data from Remote Sources
The examples in this section ingest data from remote data sources using the extract, load, and transform (ELT) method of data integration. The GDI service ingests remote data and then transforms it into a graph structure in AnzoGraph based on the provided SPARQL INSERT or CREATE MATERIALIZED VIEW queries.
The following query ingests airport-related data from a CSV file. The INSERT clause creates a graph called <http://anzograph.com/airports> and inserts the specified triple patterns based on the GDI service call in the WHERE clause. The BIND statement creates a unique subject URI for each airport using the ?iata_code from the source file.
PREFIX s: <http://cambridgesemantics.com/ontologies/DataToolkit#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX anzo: <http://openanzo.org/ontologies/2008/07/Anzo#> PREFIX zowl: <http://openanzo.org/ontologies/2009/05/AnzoOwl#> PREFIX dc: <http://purl.org/dc/elements/1.1/> INSERT { GRAPH <http://anzograph.com/airports> { ?code a <http://anzograph.com/airport> ; <http://anzograph.com/airport/name> ?name ; <http://anzograph.com/airport/city> ?city ; <http://anzograph.com/airport/state> ?state ; <http://anzograph.com/airport/latitude> ?lat; <http://anzograph.com/airport/longitude> ?long. } } WHERE { SERVICE <http://cambridgesemantics.com/services/DataToolkit> { ?data a s:FileSource ; s:url "/opt/shared-files/airports.csv" ; ?iata_code ("IATA_CODE") ; ?name ("AIRPORT") ; ?city ("CITY") ; ?state ("STATE") ; ?lat ("LATITUDE") ; ?long ("LONGITUDE"). BIND(IRI("http://anzograph.com/airport/{{?iata_code}}") as ?code) } }
Another way to ingest data into AnzoGraph is to create a materialized view. When you create a materialized view, you write a CREATE MATERIALIZED VIEW statement with a CONSTRUCT query that specifies the triple patterns for the new graph. The GDI retrieves the data from the remote source and creates a new named graph in the database. When creating views, View needs to be added to the GDI service URI, as shown below:
SERVICE <http://cambridgesemantics.com/services/DataToolkitView>
In addition, the name of the view needs to be added in parenthesis at the end of the URI. The view name in the service call must match the name of the view in the CREATE VIEW statement. For example:
CREATE OR REPLACE MATERIALIZED VIEW <http://example.com/kl_hosp/patients_m> AS CONSTRUCT { ... } WHERE { SERVICE <http://cambridgesemantics.com/services/DataToolkitView>(<http://example.com/kl_hosp/patients_m>) { ... } }
The example below creates a materialized view from data in a database source. Running this query inserts a graph called <http://example.com/kl_hosp/patients_m> in AnzoGraph. When running analytics on the data in the view, the view is treated like any other named graph.
PREFIX s: <http://cambridgesemantics.com/ontologies/DataToolkit#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX anzo: <http://openanzo.org/ontologies/2008/07/Anzo#> PREFIX zowl: <http://openanzo.org/ontologies/2009/05/AnzoOwl#> PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX : <http://example.com/ontologies/kl_hosp#> CREATE OR REPLACE MATERIALIZED VIEW <http://example.com/kl_hosp/patients_m> AS CONSTRUCT { ?patient a :Patient ; :rowid ?row_id ; :subjectid ?subject_id; :gender ?gender; :dob ?dob ; :dod ?dod ; :dod_hosp ?dod_hosp . } WHERE { SERVICE <http://cambridgesemantics.com/services/DataToolkitView>(<http://example.com/kl_hosp/patients_m>) { ?data a s:DbSource ; s:url "jdbc:postgresql://10.10.5.3:5555/kl_hosp_db?user=postgres&password=postgres123" ; s:debug true ; s:selector "kl_hosp_schema.patients" ; ?row_id xsd:int; ?subject_id xsd:int; ?gender xsd:string; ?dob xsd:dateTime; ?dod xsd:dateTime; ?dod_hosp xsd:dateTime ; BIND(IRI("http://example.com/patients/{{?subject_id}}") AS ?patient) } }
For more information about creating views, see Named Views. For guidance on creating non-materialized views, also called virtual graphs or data virtualization, see the Creating Virtual Graphs from Remote Sources section below.
The next example below ingests data from a database source using an INSERT query.
PREFIX s: <http://cambridgesemantics.com/ontologies/DataToolkit#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX anzo: <http://openanzo.org/ontologies/2008/07/Anzo#> PREFIX zowl: <http://openanzo.org/ontologies/2009/05/AnzoOwl#> PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX : <http://example.com/ontologies/kl_hosp#> INSERT { GRAPH <http://example.com/kl_hosp/inputevents_cv_i> { ?InputEvent_cv a :InputEvent_cv ; :row_id ?row_id ; :subject_id ?subject_id ; :hadm_id ?hadm_id ; :icustay_id ?icustay_id ; :charttime ?charttime ; :itemid ?itemid ; :amount ?amount ; :amountuom ?amountuom ; :rate ?rate ; :rateuom ?rateuom ; :storetime ?storetime ; :cgid ?cgid ; :orderid ?orderid ; :linkorderid ?linkorderid ; :stopped ?stopped ; :newbottle ?newbottle ; :originalamount ?originalamount ; :originalamountuom ?originalamountuom ; :originalroute ?originalroute ; :originalrate ?originalrate ; :originalrateuom ?originalrateuom ; :originalsite ?originalsite . } } WHERE { SERVICE <http://cambridgesemantics.com/services/DataToolkit> { ?data a s:DbSource ; s:url "jdbc:postgresql://10.10.5.3:5555/kl_hosp_db?user=postgres&password=postgres123" ; s:selector "kl_hosp_schema.inputevents_cv" ; ?row_id (xsd:int) ; ?subject_id (xsd:int) ; ?hadm_id (xsd:int) ; ?icustay_id (xsd:int) ; ?charttime (xsd:dateTime) ; ?itemid (xsd:int) ; ?amount (xsd:float) ; ?amountuom (xsd:string) ; ?rate (xsd:float) ; ?rateuom (xsd:string) ; ?storetime (xsd:dateTime) ; ?cgid (xsd:int) ; ?orderid (xsd:int) ; ?linkorderid (xsd:int) ; ?stopped (xsd:string) ; ?newbottle (xsd:int) ; ?originalamount (xsd:float) ; ?originalamountuom (xsd:string) ; ?originalroute (xsd:string) ; ?originalrate (xsd:float) ; ?originalrateuom (xsd:string) ; ?originalsite (xsd:string) ; BIND(IRI("http://example.com/inputevent_cv/{{?row_id}}") AS ?InputEvent_cv) BIND(IRI("http://example.com/patients/{{?subject_id}}") AS ?patient) BIND(IRI("http://example.com/admissions/{{?hadm_id}}") AS ?admission) } }
Creating Virtual Graphs from Remote Sources
The examples in this section demonstrate "data virtualization" by using the GDI service to run CREATE VIEW queries that create virtual graphs. Virtual graphs are non-materialized views whose view definition is stored in AnzoGraph but the instance data remains in the remote source. This enables you to analyze data in a remote source without first ingesting all of the data into the database.
When you create a virtual graph, you write a CREATE VIEW statement with a CONSTRUCT query that specifies the triple patterns for the virtual graph. The GDI creates a view definition in the database. When creating views, View needs to be added to the GDI service URI, as shown below:
SERVICE <http://cambridgesemantics.com/services/DataToolkitView>
In addition, the name of the view needs to be added in parenthesis at the end of the URI. The view name in the service call must match the name of the view in the CREATE VIEW statement. For example:
CREATE OR REPLACE VIEW <http://example.com/kl_hosp/transfers> AS CONSTRUCT { ... } WHERE { SERVICE <http://cambridgesemantics.com/services/DataToolkitView>(<http://example.com/kl_hosp/transfers>) { ... } }
By default, to avoid running expensive metrics-gathering queries against the source system more than once, the GDI service caches the virtual graph (view) definition when the view is created. That means if you modify and re-run the view creation query, the original view definition remains cached instead of being updated. You can include the s:debug true parameter to avoid caching the definition. If you need to drop the cache after a virtual graph is created, you can run the following query:
PREFIX s: <http://cambridgesemantics.com/ontologies/DataToolkit#> SELECT * WHERE { SERVICE <http://cambridgesemantics.com/management/DataToolkit> { # to refresh all definitions (clear all caches) # include/uncomment the following line # ( s:refresh ) . # to refresh a specific view, include the view URI as follows ( s:refresh <view_URI> ) . } }
The example below creates a virtual graph from data in a database source. Running this query inserts a virtual graph called <http://example.com/kl_hosp/transfers> in AnzoGraph but the instance data is not inserted into the database. When running analytics on the data in the view, the view is treated like any other named graph, but AnzoGraph retrieves the instance data from the data source for which the view was created.
PREFIX s: <http://cambridgesemantics.com/ontologies/DataToolkit#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX anzo: <http://openanzo.org/ontologies/2008/07/Anzo#> PREFIX zowl: <http://openanzo.org/ontologies/2009/05/AnzoOwl#> PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX : <http://example.com/ontologies/kl_hosp#> CREATE OR REPLACE VIEW <http://example.com/kl_hosp/transfers> AS CONSTRUCT { ?transfer a :Transfer ; :rowid ?row_id ; :subjectid ?patient; :hadmid ?hadm_id ; :eventtype ?eventtype . } WHERE { SERVICE <http://cambridgesemantics.com/services/DataToolkitView>(<http://example.com/kl_hosp/transfers>) { ?data a s:DbSource ; s:url "jdbc:postgresql://10.10.5.3:5555/kl_hosp_db?user=postgres&password=postgres123" ; s:selector "kl_hosp_schema.transfers" ; # selects the transfers table in the kl_hosp_schema s:debug true ; # turns off view query caching ?row_id (xsd:int) ; ?subject_id (xsd:int) ; ?hadm_id (xsd:int) ; ?eventtype (xsd:string) . BIND(IRI("http://example.com/transfers/{{?row_id}}") AS ?transfer) BIND(IRI("http://example.com/patients/{{?subject_id}}") AS ?patient) } }
The following example creates a virtual graph from a file source.
PREFIX s: <http://cambridgesemantics.com/ontologies/DataToolkit#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX anzo: <http://openanzo.org/ontologies/2008/07/Anzo#> PREFIX zowl: <http://openanzo.org/ontologies/2009/05/AnzoOwl#> PREFIX dc: <http://purl.org/dc/elements/1.1/> CREATE OR REPLACE VIEW <http://anzograph.com/airports> AS CONSTRUCT { ?code a <http://anzograph.com/airport> ; <http://anzograph.com/airport/name> ?name ; <http://anzograph.com/airport/city> ?city ; <http://anzograph.com/airport/state> ?state ; <http://anzograph.com/airport/latitude> ?lat; <http://anzograph.com/airport/longitude> ?long. } } WHERE { SERVICE <http://cambridgesemantics.com/services/DataToolkitView>(<http://anzograph.com/airports>) { ?data a s:FileSource ; s:url "/opt/shared-files/airports.csv" ; s:debug true ; ?iata_code ("IATA_CODE") ; ?name ("AIRPORT") ; ?city ("CITY") ; ?state ("STATE") ; ?lat ("LATITUDE") ; ?long ("LONGITUDE"). BIND(IRI("http://anzograph.com/airport/{{?iata_code}}") as ?code) } }
For more information about creating views, see Named Views. For guidance on creating materialized views, see the Ingesting Data from Remote Sources section above.