Exploring Your Data

This topic provides information about common questions to ask when getting to know your data.

How do I find out which predicates (keys) a data set uses?

When you receive a new dataset, one of the first things to understand about the new dataset is what predicates are used. The following query lists the predicates used in the sample Tickit dataset.

SELECT DISTINCT ?predicates
FROM <tickit>
WHERE {
  ?subject ?predicates ?object .
}
ORDER BY ?predicates
predicates
---------------
birthday
buyerid
caldate
card
catdesc
catgroup
catid
catname
city
commission
dateid
day
dislike
email
eventid
eventname
firstname
friend
holiday
http://www.w3.org/1999/02/22-rdf-syntax-ns#type lastname like listid listtime month numtickets phone pricepaid priceperticket qtr qtysold saletime sellerid ssn starttime state totalprice venuecity venuecitypop venueid venuename venueseats venueseatspct venuestate week year 46 rows

How do I determine the frequency of a predicate's use?

Once you determine what predicates a new data set uses, you might want to see how frequently the predicates are used. Frequency counts can give you an indication of which predicates the data set uses together, which helps to identify objects in the graph. The following query lists the predicates in the sample Tickit data set ordered by the frequency in which they appear.

SELECT ?predicate (COUNT (?predicate) AS ?count)
FROM <tickit>
WHERE {
  ?s ?predicate ?o .
}
GROUP BY ?predicate
ORDER BY DESC(?count)
predicate                                       | count
------------------------------------------------+---------
friend                                          | 1462476
http://www.w3.org/1999/02/22-rdf-syntax-ns#type |  424319
dateid                                          |  373751
eventid                                         |  364953
sellerid                                        |  364953
priceperticket                                  |  192497
totalprice                                      |  192497
numtickets                                      |  192497
listtime                                        |  192497
commission                                      |  172456
qtysold                                         |  172456
saletime                                        |  172456
listid                                          |  172456
pricepaid                                       |  172456
buyerid                                         |  172456
dislike                                         |  121038
like                                            |  120911
birthday                                        |   49990
email                                           |   49990
lastname                                        |   49990
phone                                           |   49990
firstname                                       |   49990
state                                           |   49990
card                                            |   49990
ssn                                             |   49990
city                                            |   49990
starttime                                       |    8798
catid                                           |    8798
venueid                                         |    8798
eventname                                       |    8798
day                                             |     365
qtr                                             |     365
caldate                                         |     365
month                                           |     365
week                                            |     365
year                                            |     365
holiday                                         |     365
venuename                                       |     202
venuecitypop                                    |     202
venuecity                                       |     202
venuestate                                      |     202
venueseatspct                                   |      58
venueseats                                      |      58
catdesc                                         |      11
catname                                         |      11
catgroup                                        |      11
46 rows

How do I find symmetric predicates?

Another part of analyzing a new data set is understanding how the predicates are used. Predicates can be used in a symmetric way, for example, Ted knows Bob and Bob knows Ted. The following query finds the predicates in the sample Tickit data set that have symmetry and returns a count of the number of times that predicate is used symmetrically:

SELECT ?symmetricPredicate (COUNT (?symmetricPredicate) AS ?count)
FROM <tickit>
WHERE {
  ?s ?symmetricPredicate ?o .
  ?o ?symmetricPredicate ?s .
}
GROUP BY ?symmetricPredicate
symmetricPredicate | count
-------------------+---------
friend             | 1293901
1 rows
Related Topics