Selectively Loading Data from RDF Files

If you do not want to load all of the data in your files, AnzoGraph enables you to hand-pick the data to load by writing an INSERT query that defines the data to include. This topic provides instructions for selectively loading data from RDF files.

For information about analyzing load files to get to know the data, see Analyzing Load Files Without Loading Data.

INSERT Data from Files Syntax

Use the following syntax to load data from files in triple or quad format. The table below describes each of the USING clause options.

[ PREFIX ... ]
INSERT { graph_or_triple_patterns }
USING [ NAMED ] EXTERNAL <URL_to_file>
[ ... ]
WHERE { graph_or_triple_patterns }
USING Option Description
USING [ NAMED ] EXTERNAL Required clause to specify that the INSERT query runs against a file rather than a graph. You can include any number of USING EXTERNAL and/or USING NAMED EXTERNAL options.
<URL_to_file> Provide the path to the location of the file or files. For example, the following URL targets one file in a data.ttl directory on a mounted file system:
<file:/global/nfs/vpc_nfs_server/data.ttl/sales-data.ttl>

This URL targets a file in a data.ttl.gz directory on S3:

<s3://mybucket/data.ttl.gz/sales-data.ttl.gz>

This example targets a directory of files on the local filesystem:

<dir:/home/user/data/rdf.ttl>

For example, the following query loads from the sample Tickit data set only the triples for which person1 is the subject. The query inserts the triple pattern ?person ?p ?o into a graph named person. The USING EXTERNAL statement provides the URL to the Tickit load files on Amazon S3, and the WHERE clause filters the data by limiting the load to triples where ?person is matched to the person1 URI.

INSERT { GRAPH <person> { ?person ?p ?o . } }
USING EXTERNAL <s3://csi-notebook-datasets/MovieTicketAnalysis/20190217/tickit.ttl.gz>
WHERE { 
  ?person ?p ?o .
  FILTER(?person = <person1>).
}

Querying for all of the triples in the new person graph shows that it only includes data for person1:

SELECT *
FROM <person>
WHERE {
  ?s ?p ?o .
}
ORDER BY ?p
s       | p                           | o
--------+-----------------------------+-----------------------------------------
person1 | birthday                    | 1939-11-19
person1 | card                        | 3876972207981477
person1 | city                        | Kent
person1 | dislike                     | broadway
person1 | dislike                     | jazz
person1 | email                       | Etiam.laoreet.libero@sodalesMaurisblandit.edu
person1 | firstname                   | Rafael
person1 | friend                      | person47127
person1 | friend                      | person43803
person1 | friend                      | person7877
person1 | friend                      | person2240
...
59 rows
Related Topics