Data Linking Options

When a data source does not define keys (such as a CSV or JSON source), the GDI provides properties that enable you to create a connected knowledge graph by defining relationships, resource templates (primary keys) and object properties (foreign keys), when you are loading data from multiple sources. The properties that are available are described below.

Data Linking Syntax

s:key ("column_name") ;
s:reference [ 
    s:model "table_to_reference" ; 
    s:using ("foreign_key_column") 
]
Option Data Type Description
key string Include this property when you want to define the primary key column for the source file or table. This column is leveraged in a resource template for the instances that are created from the source. For example, s:key ("EMPLOYEE_ID").
reference RDF list Include this property when you want to specify a foreign key column. The reference property is an RDF list that includes the model property to list the target table and a using property that defines the foreign key column in the source table.
s:reference [ s:model "table_to_reference" ; s:using ("foreign_key_column") ]

You can also include an optional key property within the s:reference list that defines the key column in the target table and can be used as a way to expose additional metadata that helps inform the GDI how to name the object property. For example:

s:reference [ s:model "Employees" ; s:using ("EMPLOYEE_ID") ; s:key ("EMPLOYEE_ID") ]

Data Linking Examples

For example, the query snippet below defines two data sources. The s:model property defines the table/class for each source, and the s:key defines the primary key for each table/class. The s:reference property for the "venue" table defines a foreign key relationship from venue.EVENT_ID to event.EVENT_ID.

?event a s:FileSource ;
   s:model "event" ;
   s:url  "/opt/shared-files/csv/events.csv" ;
   s:key ("EVENT_ID") .

?venue a s:FileSource ;
   s:model "venue" ;
   s:url " /opt/shared-files/csv/venues.csv" ;
   s:key ("VENUE_ID") ;
   s:reference [ s:model "event" ; s:using ("EVENT_ID") ] .

The following query for multiple file sources generates RDF and an ontology with resource templates and object properties. The query also includes global normalization rules for normalizing the data across all sources (see Normalization Options for information about normalization).

PREFIX s: <http://cambridgesemantics.com/ontologies/DataToolkit#>

INSERT {
   GRAPH <http://anzograph.com/tickets> {
      ?s ?p ?o .
  }
}
WHERE { 
   SERVICE <http://cambridgesemantics.com/services/DataToolkit> {

      ?event a s:FileSource ;
         s:model "event" ;
         s:url  "/opt/shared-files/csv/events.csv" ;
         s:key ("EVENT_ID") .

      ?listing a s:FileSource ;
         s:model "listing" ;
         s:url " /opt/shared-files/csv/listings.csv" ;
         s:key ("LIST_ID") ;
         s:reference [ s:model "event" ; s:using ("EVENT_ID") ; s:key ("EVENT_ID") ] .

      ?date a s:FileSource ;
         s:model "date" ;
         s:url  "/opt/shared-files/csv/event_dates.csv" ;
         s:key ("DATE_ID") ;
         s:reference [ s:model "event" ; s:using ("EVENT_ID") ; s:key ("EVENT_ID") ] .

      ?venue a s:FileSource ;
         s:model "venue" ;
         s:url " /opt/shared-files/csv/venues.csv" ;
         s:key ("VENUE_ID") ;
         s:reference [ s:model "event" ; s:using ("EVENT_ID") ; s:key ("EVENT_ID") ] .
     
      ?sale a s:FileSource ;
         s:model "sale" ;
         s:url " /opt/shared-files/csv/sales.csv" ;
         s:key ("SALE_ID") ;
         s:reference [ s:model "event" ; s:using ("EVENT_ID") ; s:key ("EVENT_ID") ] ;
         s:reference [ s:model "listing" ; s:using ("LIST_ID") ; s:key ("LIST_ID") ] .

      ?rdf a s:RdfGenerator, s:OntologyGenerator ;
         s:as (?s ?p ?o) ;
         s:ontology <http://anzograph.com/tickets> ;
         s:base <http://anzograph.com/data> ;
         s:normalize [ 
            s:all [
               s:casing s:UPPER ;
               s:localNameSeparator "_" ;
            ] ;
         ] .
  }
}