Selectively Loading Data from Files
If you do not want to load all of the data in your files, AnzoGraph enables you to hand-pick the data to load by writing an INSERT query that defines the data to include. This topic provides instructions for selectively loading data.
For information about analyzing load files to get to know the data, see Analyzing Load Files Without Loading Data.
INSERT Data from Files Syntax
The syntax that you use to selectively load data from files depends on the file format:
- For files in triple or quad format, like Turtle (.ttl), N-Triple (.n3 and .nt), N-Quad (.nq and .quads), and TriG (.trig) use the syntax in Triple and Quad Format below.
- For files in tabular format, such as CSV files, follow the syntax in Loading Tabular Files.
Triple and Quad Format
Use the following syntax to load data from files in triple or quad format. The table below describes each of the USING clause options. For information about loading data from CSV files, see Loading Tabular Files.
[ PREFIX ... ] INSERT { graph_or_triple_patterns } USING [ NAMED ] EXTERNAL <URL_to_file> [ ... ] WHERE { graph_or_triple_patterns }
USING Option | Description |
---|---|
USING [ NAMED ] EXTERNAL | Required clause to specify that the INSERT query runs against a file rather than the database. You can include any number of USING EXTERNAL and/or USING NAMED EXTERNAL options. |
<URL_to_file> | Provide the path to the location of the file or files. For example, the following URL targets one file in a data.ttl directory on a mounted file system:<file:/global/nfs/vpc_nfs_server/data.ttl/sales-data.ttl> This URL targets a file in a data.ttl.gz directory on S3: <s3://mybucket/data.ttl.gz/sales-data.ttl.gz> This example targets a directory of files on the local filesystem: <dir:/home/user/data/tickit.ttl> |
For example, the following query loads from the sample Tickit data set only the triples for which person1 is the subject. The query inserts the triple pattern ?person ?p ?o
into a graph named person. The USING EXTERNAL statement provides the URL to the Tickit load files on Amazon S3, and the WHERE clause filters the data by limiting the load to triples where ?person is matched to the person1 URI.
INSERT { GRAPH <person> { ?person ?p ?o . } } USING EXTERNAL <s3://csi-notebook-datasets/MovieTicketAnalysis/20190217/tickit.ttl.gz> WHERE { ?person ?p ?o . FILTER(?person = <person1>). }
Querying for all of the triples in the new person graph shows that it only includes data for person1:
SELECT * FROM <person> WHERE { ?s ?p ?o . } ORDER BY ?p
s | p | o --------+-----------------------------+----------------------------------------- person1 | birthday | 1939-11-19 person1 | card | 3876972207981477 person1 | city | Kent person1 | dislike | broadway person1 | dislike | jazz person1 | email | Etiam.laoreet.libero@sodalesMaurisblandit.edu person1 | firstname | Rafael person1 | friend | person47127 person1 | friend | person43803 person1 | friend | person7877 person1 | friend | person2240 ...
59 rows