Selectively Loading Data from RDF Files
If you do not want to load all of the data in your files, AnzoGraph enables you to hand-pick the data to load by writing an INSERT query that defines the data to include. This topic provides instructions for selectively loading data from RDF files.
For information about analyzing load files to get to know the data, see Analyzing Load Files Without Loading Data.
INSERT Data from Files Syntax
Use the following syntax to load data from files in triple or quad format. The table below describes each of the USING clause options.
[ PREFIX ... ] INSERT { graph_or_triple_patterns } USING [ NAMED ] EXTERNAL <URL_to_file> [ ... ] WHERE { graph_or_triple_patterns }
USING Option | Description |
---|---|
USING [ NAMED ] EXTERNAL | Required clause to specify that the INSERT query runs against a file rather than a graph. You can include any number of USING EXTERNAL and/or USING NAMED EXTERNAL options. |
<URL_to_file> | Provide the path to the location of the file or files. For example, the following URL targets one file in a data.ttl directory on a mounted file system:<file:/global/nfs/vpc_nfs_server/data.ttl/sales-data.ttl> This URL targets a file in a data.ttl.gz directory on S3: <s3://mybucket/data.ttl.gz/sales-data.ttl.gz> This example targets a directory of files on the local filesystem: <dir:/home/user/data/rdf.ttl> |
For example, the following query loads from the sample Tickit data set only the triples for which person1 is the subject. The query inserts the triple pattern ?person ?p ?o
into a graph named person. The USING EXTERNAL statement provides the URL to the Tickit load files on Amazon S3, and the WHERE clause filters the data by limiting the load to triples where ?person is matched to the person1 URI.
INSERT { GRAPH <person> { ?person ?p ?o . } } USING EXTERNAL <s3://csi-notebook-datasets/MovieTicketAnalysis/20190217/tickit.ttl.gz> WHERE { ?person ?p ?o . FILTER(?person = <person1>). }
Querying for all of the triples in the new person graph shows that it only includes data for person1:
SELECT * FROM <person> WHERE { ?s ?p ?o . } ORDER BY ?p
s | p | o --------+-----------------------------+----------------------------------------- person1 | birthday | 1939-11-19 person1 | card | 3876972207981477 person1 | city | Kent person1 | dislike | broadway person1 | dislike | jazz person1 | email | Etiam.laoreet.libero@sodalesMaurisblandit.edu person1 | firstname | Rafael person1 | friend | person47127 person1 | friend | person43803 person1 | friend | person7877 person1 | friend | person2240 ...
59 rows