Creating and Querying Labeled Property Graphs (RDF*)
AnzoGraph supports the Labeled Property Graph (LPG) model for adding metadata about the relationships in your graphs. Properties that express values such as start and end dates, data provenance tracking, or the weight, score, or veracity of the data can be added to a graph to further define any of the relationships in the data.
AnzoGraph's LPG implementation follows the proposed RDF* and SPARQL* extension to the W3C SPARQL query language and RDF data model specifications. The proposal, called Foundations of an Alternative Approach to Reification in RDF, is a work in progress, and Cambridge Semantics is a contributor to the working group. The syntax described in the document may not be included in the final specification, and AnzoGraph does not support all of the examples included in the proposal at this time. To view the working draft of the RDF* and SPARQL* specification, click here.
This topic provides information about loading and inserting properties and querying property graphs.
- Defining Properties in Turtle Load Files
- Defining Properties in INSERT Queries
- Querying Property Graphs
Defining Properties in Turtle Load Files
This section provides information about how to create a property graph by defining relationship properties in a Turtle load file. For instructions on creating properties in INSERT queries, see Defining Properties in INSERT Queries below.
There is a limit of 255 total property values per edge. AnzoGraph returns an error if you attempt to load or insert more than 255 property values for the same relationship.
To define a relationship property in a Turtle file, wrap the triplet in double arrow heads ( << >>), and then specify the property URI and value at the end of the triplet:
<< <subject> <predicate> <object> >> <property_URI> <property_value> .
For example, the TTL file contents below include properties that further define the like, dislike, and friend relationships in the triples. The file adds a weight property to define how much <person3> likes or dislikes certain types of events, and the file adds startDate and endDate properties to <friend> predicates to define the start and end dates of friendships.
<person3> rdf:type <person>; <card> "4984932249480735"^^xsd:long; <birthday> "1963-07-02"^^xsd:date; <ssn> 503703220; <firstname> "Lars"; <lastname> "Ratliff"; <city> "High Point"; <state> "NY"; <email> "amet.faucibus.ut@condimentumegetvolutpat.ca"; <phone> "(624) 767-2465". << <person3> <like> "sports">> <weight> 8. << <person3> <like> "rock">> <weight> 9. << <person3> <like> "musicals">> <weight> 4. << <person3> <dislike> "theatre">> <weight> 5. << <person3> <dislike> "jazz">> <weight> 9. << <person3> <dislike> "opera">> <weight> 10. << <person3> <friend> <person8563> >> <startDate> "1990-01-04"^^xsd:date. << <person3> <friend> <person38436> >> <startDate> "2000-04-27"^^xsd:date. << <person3> <friend> <person11979> >> <startDate> "2004-11-09"^^xsd:date. << <person3> <friend> <person11979> >> <endDate> "2012-07-17"^^xsd:date. <person3> <friend> <person8639>,<person18536>,<person42975>,<person47376>, <person1692>,<person2556>,<person11979>,<person20860>,<person21259>,<person26586>, <person27529>,<person31735>,<person36264>,<person38436>,<person42306>,<person42975>.
The example above contains both compact and long Turtle notation. When defining properties in files, tuples that contain properties must include the complete reference triple (subject, predicate, and object). Properties cannot be added to triples specified in compact notation. In addition, specify one property per triplet. To define multiple properties for the same triplet, list the triplet multiple times. For example, the following lines in the example above define two properties (startDate and endDate) for the person3 friend person11979
triple:
<< <person3> <friend> <person11979> >> <startDate> "2004-11-09"^^xsd:date. << <person3> <friend> <person11979> >> <endDate> "2012-07-17"^^xsd:date.
By default, the sample Tickit data set already includes startDate and endDate properties for the friend predicates. The example above defines start and end date properties only for illustrative purposes.
Defining Properties in INSERT Queries
Users can create property graphs using INSERT and INSERT DATA syntax to insert triples and properties or add properties to existing triples.
To define properties in INSERT statements, use the same syntax as Turtle files: wrap triplets in double arrow heads ( << >>), and then specify the property URI and value for that triple at the end of the triplet.
<< <subject> <predicate> <object> >> <property_URI> <property_value> .
There is a limit of 255 total property values per edge. AnzoGraph returns an error if you attempt to load or insert more than 255 property values for the same relationship.
For example, the INSERT DATA statement below adds weight properties to the like and dislike predicates for person3. This example specifies literal values for weight property.
INSERT DATA { GRAPH <tickit> { << <person3> <dislike> "jazz" >> <weight> 9 . << <person3> <dislike> "theatre" >> <weight> 5 . << <person3> <dislike> "opera" >> <weight> 10 . << <person3> <like> "sports" >> <weight> 8 . << <person3> <like> "rock" >> <weight> 9 . << <person3> <like> "musicals" >> <weight> 4 . } }
The following example INSERT statement queries the Tickit graph to find the sellers whose total sales amount is greater than or equal to $20,000. For each seller who meets the requirement, the INSERT clause inserts an earned predicate with a property named score and a score value of 10:
INSERT {GRAPH <tickit> { <<?person <earned> ?earned>> <score> 10 } } WHERE {GRAPH <tickit> { { SELECT ?person (SUM(?dollars) AS ?earned) WHERE { ?person <firstname> ?first . ?person <lastname> ?last . ?sale <sellerid> ?person . ?sale <pricepaid> ?dollars . } GROUP BY ?person } FILTER(?earned >= 20000) } }
Selecting the newly created triples shows that 52 people met the requirement and were assigned a <score> property with a value of 10:
SELECT ?person ?earned ?score FROM <tickit> WHERE { <<?person <earned> ?earned>> <score> ?score } ORDER BY ?person
person | earned | score ------------+--------------+------- person19231 | 22636.000000 | 10 person30007 | 20521.000000 | 10 person16335 | 20160.000000 | 10 person15976 | 20929.000000 | 10 person49919 | 21218.000000 | 10 person30764 | 21014.000000 | 10 person24980 | 24857.000000 | 10 person8038 | 20015.000000 | 10 person36217 | 24269.000000 | 10 person26198 | 21243.000000 | 10 person1140 | 32399.000000 | 10 person35284 | 20131.000000 | 10 person34730 | 20448.000000 | 10 person19814 | 20465.000000 | 10 person34982 | 22262.000000 | 10 ... 52 rows
The following example shows how to create properties and assign values based on data that exists in a source file. The data for the example is a CSV file with the following columns and data:
Airline,FlightNumber,TailNumber,OriginAirport,DestinationAirport,Distance AS,98,N407AS,ANC,SEA,1448 AA,2336,N3KUAA,LAX,PBI,2330 US,840,N171US,SFO,CLT,2296 AA,258,N3HYAA,LAX,MIA,2342 AS,135,N527AS,SEA,ANC,1448 DL,806,N3730B,SFO,MSP,1589 NK,612,N635NK,LAS,MSP,1299 US,2013,N584UW,LAX,CLT,2125
The example INSERT query for the file above defines the Distance column as a property and adds the Distance value as the value for the property:
INSERT { GRAPH <flights> { ?OriginIRI a <Airport> . ?DestinationIRI a <Airport> . << ?OriginIRI <Destination> ?DestinationIRI >> <Distance> ?Distance . ?FlightIRI a <Flight> ; <Airline> ?Airline ; <FlightNumber> ?FlightNumber ; <TailNumber> ?TailNumber . } } WHERE { TABLE <file:/home/user/flights.csv>> ('csv','global',',',true,'Airline:char,FlightNumber:char,TailNumber:char, OriginAirport:char,DestinationAirport:char,Distance:int') BIND(IRI(CONCAT(CONCAT("Flight",str(?FlightNumber),str(?TailNumber)))) as ?FlightIRI) BIND(IRI(str(?OriginAirport)) as ?OriginIRI) BIND(IRI(str(?DestinationAirport)) as ?DestinationIRI) }
The following query returns the origin and destination airports for the flights as well as the distance property value:
SELECT ?from ?to ?distance FROM <flights> WHERE { << ?from ?p ?to >> ?property ?distance } ORDER BY ?distance
from | to | distance -----+-----+---------- LAS | MSP | 1299 SEA | ANC | 1448 ANC | SEA | 1448 SFO | MSP | 1589 LAX | CLT | 2125 SFO | CLT | 2296 LAX | PBI | 2330 LAX | MIA | 2342 8 rows
Querying Property Graphs
To return properties and their values when analyzing data sets, include the following property graph syntax in graph and triple patterns:
<< <subject> <predicate> <object> >> <property_URI> <property_value> .
The following example query returns the properties that were defined in the INSERT DATA query above.
SELECT * FROM <tickit> WHERE { << ?person ?p ?likes_or_dislikes >> ?property ?value. FILTER(?p=<like> || ?p=<dislike>) } ORDER BY ?p
person | p | likes_or_dislikes | property | value --------+---------+-------------------+----------+------- person3 | dislike | jazz | weight | 9 person3 | dislike | opera | weight | 10 person3 | dislike | theatre | weight | 5 person3 | like | rock | weight | 9 person3 | like | musicals | weight | 4 person3 | like | sports | weight | 8 6 rows
This example returns a list of the properties in the Tickit graph and lists the number of times each property is referenced in the graph:
SELECT ?property (COUNT(?property) AS ?times_used) FROM <tickit> WHERE { << ?s ?p ?o >> ?property ?value } GROUP BY ?property ORDER BY desc(?times_used)
property | times_used -----------+------------ startDate | 1729764 endDate | 173036 score | 52 weight | 6 4 rows