SPARQL Best Practices

When compared with SQL, SPARQL's syntax and grammar is less enforceable. In a graph database, since the data defines the schema, the data cannot be evaluated against the schema. In addition, since RDF graphs typically contain semi-structured data, the database can include data that is incomplete or unknown. This topic provides tips to help you avoid getting unexpected results when running SPARQL queries.

Look for Typos

Mistyping a predicate, for example, does not produce an error such as "predicate does not exist." Instead the query might not produce any results.

Example

The following query counts the distinct number of likes in the sample Tickit data. As shown in the WHERE clause, the predicate in the tickit graph is "<like>". The results show that there are 10 distinct likes, or 10 distinct objects for the <like> predicate:

SELECT (count(?o) as ?numberOfLikes)
FROM <http://anzograph.com/tickit>
WHERE { {
  SELECT DISTINCT ?o
  WHERE { ?s <http://anzograph.com/tickit/like> ?o }
 }
}
numberOfLikes
--------------
10
1 rows

Misspelling "like" as "likes" does not produce an error, but the query returns no results:

SELECT (count(?o) as ?numberOfLikes)
FROM <http://anzograph.com/tickit>
WHERE { {
  SELECT DISTINCT ?o
  WHERE { ?s <http://anzograph.com/tickit/likes> ?o }
 }
}
numberOfLikes
--------------
0
1 rows

Make Some Triple Patterns Optional

Some queries might need to account for missing or incomplete data. To ensure that triples are not excluded from the results because they follow some of the query's triple patterns but not all, you can use the OPTIONAL keyword to make certain triple patterns optional.

For example, the sample Tickit dataset includes person graphs. These graphs contain triples with a person subject and predicates such as first name, last name, birthday, credit card number, like, and dislike. Some person graphs are missing like or dislike predicates, so querying for person data using like or dislike in the pattern may produce unexpected results.

Example

The following example queries the Tickit dataset to find the first and last name and likes and dislikes for all of the people who have bought tickets:

PREFIX tickit: <http://anzograph.com/tickit/>
SELECT ?fname ?lname ?like ?dislike
FROM <http://anzograph.com/tickit>
WHERE {
  ?sale tickit:buyerid ?person .
  ?person tickit:firstname ?fname .
  ?person tickit:lastname ?lname .
  ?person tickit:like ?like .
  ?person tickit:dislike ?dislike .
}
GROUP BY ?fname ?lname ?like ?dislike

The patterns in the WHERE clause ask for person data where the triples include firstname, lastname, like, and dislike. Any person triples that are missing any of the patterns are excluded from the results. This query returns 188536 rows.

Using OPTIONAL clauses in the query changes the criteria so that all of the first and last names are returned and like or dislike data is returned if it exists. This query makes like and dislike optional:

PREFIX tickit: <http://anzograph.com/tickit/>
SELECT ?fname ?lname ?like ?dislike
FROM <http://anzograph.com/tickit>
WHERE {
  ?sale tickit:buyerid ?person .
  ?person tickit:firstname ?fname .
  ?person tickit:lastname ?lname .
  OPTIONAL { ?person tickit:like ?like }
  OPTIONAL { ?person tickit:dislike ?dislike }
}
GROUP BY ?fname ?lname ?like ?dislike

This query returns 202862 rows because it includes person triples with first and last names and does not exclude triples that are missing like or dislike predicates.

Avoid Unexpected Results When Constructing Data

CONSTRUCT queries return a single RDF graph specified by the template that you supply. The result takes each query solution and substitutes for the variables in the template and then combines the triples into a graph. If you specify a pattern that produces a triple that contains an unbound variable or an illegal RDF construct such as a literal value in the subject or predicate position, then you may get unexpected results because the problematic triples are excluded from the output graph.

Related Topics