Understanding Your Data as a Graph

This topic provides information about common questions to ask when getting to know your data as a graph.

How do I find the most connected people?

Graphs are often used to represent social behavior and social relationships. For example, the following triple represents that person12595 is a friend of person2:

<person2> <friend> <person12595>

A common task in social network analytics is to find out how connected are people in the social graph and who is the most connected. These questions are answered by computing the social network degree. People who have high network degree are the hubs of the social network. The following query demonstrates this by counting the number of friend relationships each person has in the sample Tickit data set. This query lists the top ten most connected people in the tickit graph:

SELECT ?person (COUNT(?friend) AS ?friendDegree)
FROM <tickit>
WHERE {
  ?person <friend> ?friend
}
GROUP BY ?person
ORDER BY DESC(?friendDegree)
LIMIT 10
person     | friendDegree
-----------+--------------
person4165 |          166
person2943 |          160
person178  |          153
person202  |          153
person1304 |          149
person2054 |          147
person3083 |          132
person6811 |          125
person4166 |          125
person1435 |          121
10 rows

What is the size of a person's network?

The size of a person's network is usually computed out to two generations: the people a person knows and the people who those people know. The following query computes the size of person2's network in the sample Tickit data set. The COUNT expression subtracts 1 to remove person2 from the count:

SELECT (COUNT(?friend)-1 AS ?networkSize)
FROM <tickit>
WHERE {
  { SELECT DISTINCT ?friend
    WHERE { 
     { <person2> <friend> ?friend . }
    UNION
     { <person2> <friend> ?friend1 .
       ?friend1 <friend> ?friend . }
  }
 }
}
networkSize
-------------
907
1 rows

What is the density of the social network?

Network density measures whether a network is well-connected. When the network density equals 1, it indicates a clique: everyone is connected to everyone else. Compute network density by finding the ratio of the number of edges to the number of possible edges in a graph. You can use density for comparing different social networks or different regions within a social network.

The following example uses the friend relationships in the sample Tickit data set to determine the social graph density of the tickit graph. The number of edges is the number of triples that contain friend as a predicate. The number of possible friend relationships is n(n-1) where n is the number of people in Tickit.

SELECT (?nrEdges/(?nrNodes *(?nrNodes - 1.0)) AS ?graphDensity)
FROM <tickit>
WHERE {
  { SELECT (COUNT (*) AS ?nrEdges)  (COUNT (DISTINCT ?person) AS ?nrNodes)
    WHERE { ?person <friend> ?anotherPerson . }
 }
}
graphDensity
--------------
0.000585
1 rows

Who has the most friends who know each other?

To find the most important people in a network, you can analyze how well-connected each person is. People are well-connected when their friends know each other. This is called a clique-to-triad. The following example identifies and counts triads to find the people in the sample Tickit data set who have the most friends who know each other. This query uses the friend relationship to rank the top ten people by the number of triads:

SELECT ?person (COUNT (*) AS ?triads)
FROM <tickit>
WHERE {
  { SELECT DISTINCT ?person ?anotherPerson ?yetAnotherPerson
    WHERE { ?person <friend> ?anotherPerson .
            ?person <friend> ?yetAnotherPerson .
            ?anotherPerson <friend> ?yetAnotherPerson .
            FILTER (?yetAnotherPerson > ?anotherPerson) }
 }
}
GROUP BY ?person
ORDER BY desc(?triads)
LIMIT 10
person      | triads
------------+--------
person16955 |    130
person16956 |     41
person8876  |     17
person4165  |      9
person539   |      9
person1304  |      9
person6811  |      8
person5763  |      8
person2350  |      7
person10147 |      7
10 rows
Related Topics