Exploring Your Data
This topic provides information about common questions to ask when getting to know your data.
- How do I find out which predicates (keys) a data set uses?
- How do I determine the frequency of a predicate's use?
- How do I find symmetric predicates?
How do I find out which predicates (keys) a data set uses?
When you receive a new dataset, one of the first things to understand about the new dataset is what predicates are used. The following query lists the predicates used in the sample Tickit dataset.
SELECT DISTINCT ?predicates FROM <tickit> WHERE { ?subject ?predicates ?object . } ORDER BY ?predicates
predicates --------------- birthday buyerid caldate card catdesc catgroup catid catname city commission dateid day dislike email eventid eventname firstname friend holiday
http://www.w3.org/1999/02/22-rdf-syntax-ns#type lastname like listid listtime month numtickets phone pricepaid priceperticket qtr qtysold saletime sellerid ssn starttime state totalprice venuecity venuecitypop venueid venuename venueseats venueseatspct venuestate week year 46 rows
How do I determine the frequency of a predicate's use?
Once you determine what predicates a new data set uses, you might want to see how frequently the predicates are used. Frequency counts can give you an indication of which predicates the data set uses together, which helps to identify objects in the graph. The following query lists the predicates in the sample Tickit data set ordered by the frequency in which they appear.
SELECT ?predicate (COUNT (?predicate) AS ?count) FROM <tickit> WHERE { ?s ?predicate ?o . } GROUP BY ?predicate ORDER BY DESC(?count)
predicate | count ------------------------------------------------+--------- friend | 1462476 http://www.w3.org/1999/02/22-rdf-syntax-ns#type | 424319 dateid | 373751 eventid | 364953 sellerid | 364953 priceperticket | 192497 totalprice | 192497 numtickets | 192497 listtime | 192497 commission | 172456 qtysold | 172456 saletime | 172456 listid | 172456 pricepaid | 172456 buyerid | 172456 dislike | 121038 like | 120911 birthday | 49990 email | 49990 lastname | 49990 phone | 49990 firstname | 49990 state | 49990 card | 49990 ssn | 49990 city | 49990 starttime | 8798 catid | 8798 venueid | 8798 eventname | 8798 day | 365 qtr | 365 caldate | 365 month | 365 week | 365 year | 365 holiday | 365 venuename | 202 venuecitypop | 202 venuecity | 202 venuestate | 202 venueseatspct | 58 venueseats | 58 catdesc | 11 catname | 11 catgroup | 11 46 rows
How do I find symmetric predicates?
Another part of analyzing a new data set is understanding how the predicates are used. Predicates can be used in a symmetric way, for example, Ted knows Bob and Bob knows Ted. The following query finds the predicates in the sample Tickit data set that have symmetry and returns a count of the number of times that predicate is used symmetrically:
SELECT ?symmetricPredicate (COUNT (?symmetricPredicate) AS ?count) FROM <tickit> WHERE { ?s ?symmetricPredicate ?o . ?o ?symmetricPredicate ?s . } GROUP BY ?symmetricPredicate
symmetricPredicate | count -------------------+--------- friend | 1293901 1 rows