Understanding Your Data as a Graph
This topic provides information about common questions to ask when getting to know your data as a graph.
- How do I find the most connected people?
- What is the size of a person's network?
- What is the density of the social network?
- Who has the most friends who know each other?
How do I find the most connected people?
Graphs are often used to represent social behavior and social relationships. For example, the following triple represents that person12595 is a friend of person2:
<person2> <friend> <person12595>
A common task in social network analytics is to find out how connected are people in the social graph and who is the most connected. These questions are answered by computing the social network degree. People who have high network degree are the hubs of the social network. The following query demonstrates this by counting the number of friend relationships each person has in the sample Tickit data set. This query lists the top ten most connected people in the tickit graph:
SELECT ?person (COUNT(?friend) AS ?friendDegree) FROM <tickit> WHERE { ?person <friend> ?friend } GROUP BY ?person ORDER BY DESC(?friendDegree) LIMIT 10
person | friendDegree -----------+-------------- person4165 | 166 person2943 | 160 person178 | 153 person202 | 153 person1304 | 149 person2054 | 147 person3083 | 132 person6811 | 125 person4166 | 125 person1435 | 121 10 rows
What is the size of a person's network?
The size of a person's network is usually computed out to two generations: the people a person knows and the people who those people know. The following query computes the size of person2's network in the sample Tickit data set. The COUNT expression subtracts 1 to remove person2 from the count:
SELECT (COUNT(?friend)-1 AS ?networkSize) FROM <tickit> WHERE { { SELECT DISTINCT ?friend WHERE { { <person2> <friend> ?friend . } UNION { <person2> <friend> ?friend1 . ?friend1 <friend> ?friend . } } } }
networkSize ------------- 907 1 rows
What is the density of the social network?
Network density measures whether a network is well-connected. When the network density equals 1, it indicates a clique: everyone is connected to everyone else. Compute network density by finding the ratio of the number of edges to the number of possible edges in a graph. You can use density for comparing different social networks or different regions within a social network.
The following example uses the friend relationships in the sample Tickit data set to determine the social graph density of the tickit graph. The number of edges is the number of triples that contain friend as a predicate. The number of possible friend relationships is n(n-1) where n is the number of people in Tickit.
SELECT (?nrEdges/(?nrNodes *(?nrNodes - 1.0)) AS ?graphDensity) FROM <tickit> WHERE { { SELECT (COUNT (*) AS ?nrEdges) (COUNT (DISTINCT ?person) AS ?nrNodes) WHERE { ?person <friend> ?anotherPerson . } } }
graphDensity -------------- 0.000585 1 rows
Who has the most friends who know each other?
To find the most important people in a network, you can analyze how well-connected each person is. People are well-connected when their friends know each other. This is called a clique-to-triad. The following example identifies and counts triads to find the people in the sample Tickit data set who have the most friends who know each other. This query uses the friend relationship to rank the top ten people by the number of triads:
SELECT ?person (COUNT (*) AS ?triads) FROM <tickit> WHERE { { SELECT DISTINCT ?person ?anotherPerson ?yetAnotherPerson WHERE { ?person <friend> ?anotherPerson . ?person <friend> ?yetAnotherPerson . ?anotherPerson <friend> ?yetAnotherPerson . FILTER (?yetAnotherPerson > ?anotherPerson) } } } GROUP BY ?person ORDER BY desc(?triads) LIMIT 10
person | triads ------------+-------- person16955 | 130 person16956 | 41 person8876 | 17 person4165 | 9 person539 | 9 person1304 | 9 person6811 | 8 person5763 | 8 person2350 | 7 person10147 | 7 10 rows