Centrality Algorithms

Centrality algorithms identify important nodes in a graph:

  • PageRank: Ranks the nodes in a graph by their relative importance or influence. Google uses PageRank to rank websites in their search engine results.
  • Betweenness Centrality: Detects the amount of influence a vertex has over the flow of information in a graph.

PageRank

The PageRank algorithm ranks the nodes in a graph by their relative importance or influence. PageRank determines each node's ranking by identifying the number of links to the node, the outbound links from the node, and the quality of the links. The quality of a link is determined by the importance (PageRank) of the connected nodes. For labeled property graphs, the PageRank algorithm accepts an edge property that can be considered a relationship weight to factor into the PageRank calculation.

Syntax

The PageRank algorithm is available in the graphalgo extension library (http://cambridgesemantics.com/anzograph/graphalgo#page_rank) and is implemented as a procedure. To incorporate the PageRank algorithm in a query, use the following syntax in the FROM clause. The arguments that are links are described below.

SELECT triple_patterns_and_expressions
# The FROM clause lists the URI for the page_rank alogrithm and
# includes in parentheses the input parameters for the algorithm.
FROM <http://cambridgesemantics.com/anzograph/graphalgo#page_rank>
(
  <graph_URI>,
  <edge_URI>, 
  [ <weighted_property>, ]
  [ damping_factor, ]
  [ max_iterations, ]
  [ error_tolerance, ]
  [ normalized ]
)
WHERE  {
 ...
}
Argument Range Description
graph_URI URI The URI of the target graph.
edge_URI URI The URI of the edge that connects the nodes to rank.
weighted_property URI Optional argument that defines the URI of the relationship property on the edge_URI that contains values to be factored in as a weight when calculating the importance of the nodes. When this property is omitted, page_rank is unweighted.
damping_factor 0.0 - 1.0 Optional argument that represents the edge traversal probability. The damping factor is an estimate of the probability that a user will stay on the page rather than follow the link (edge). The damping_factor value is subtracted from 1.0 in the calculation. The default value is 0.85.
max_iterations 1 - 100 Optional argument that specifies the maximum number of times to iterate through the graph to adjust approximate PageRank values. The default value is 40.
error_tolerance 0.0 - 0.1 Optional argument that specifies the error tolerance to use. If the sum of the error values for all nodes is below this tolerance value, PageRank iterations are stopped. The default value is 1e-8.
normalized boolean Optional argument that controls whether to produce PageRank values that are between 0 and 1. The default value is false.

Examples

The example below uses the unweighted PageRank algorithm to find the 10 most connected airports. The edge to operate on is defined as the hasRouteTo URI, which links the airport nodes.

SELECT ?airport ?rank
FROM <http://cambridgesemantics.com/anzograph/graphalgo#page_rank>(
  <http://anzograph.com/airline_flight_network>,
  <http://anzograph.com/flights/hasRouteTo>
)
WHERE
{
  ?airport ?p ?rank .
}
ORDER BY desc(?rank)
LIMIT 10

The results show that ORD (Chicago O'Hare) has the highest PageRank. It has the highest number of links to other airports.

airport                                  | rank
-----------------------------------------+---------
http://anzograph.com/flights/Airport/ORD | 13.4122
http://anzograph.com/flights/Airport/DFW | 13.1632
http://anzograph.com/flights/Airport/ATL | 12.8073
http://anzograph.com/flights/Airport/DEN | 10.3813
http://anzograph.com/flights/Airport/IAH | 8.38149
http://anzograph.com/flights/Airport/SLC | 6.90182
http://anzograph.com/flights/Airport/MSP | 6.24396
http://anzograph.com/flights/Airport/SFO | 5.50728
http://anzograph.com/flights/Airport/PHX | 5.27483
http://anzograph.com/flights/Airport/LAX | 5.24546
10 rows

By including the edge property, <http://anzograph.com/flights/distanceMiles>, the example below uses the weighted PageRank algorithm to find the 10 most connected airports where distance between the airports is factored into the calculation.

SELECT ?airport ?rank
FROM <http://cambridgesemantics.com/anzograph/graphalgo#page_rank>(
  <http://anzograph.com/airline_flight_network>,
  <http://anzograph.com/flights/hasRouteTo>,
  <http://anzograph.com/flights/distanceMiles>
)
WHERE
{
  ?airport ?p ?rank .
}
ORDER BY desc(?rank)
LIMIT 10
airport                                  | rank
-----------------------------------------+--------
http://anzograph.com/flights/Airport/ORD |  2.445
http://anzograph.com/flights/Airport/ATL |  1.935
http://anzograph.com/flights/Airport/DFW |   1.68
http://anzograph.com/flights/Airport/SLC |   1.17
http://anzograph.com/flights/Airport/MSP |  0.915
http://anzograph.com/flights/Airport/DTW |   0.66
http://anzograph.com/flights/Airport/SFO |   0.66
http://anzograph.com/flights/Airport/FLL | 0.5325
http://anzograph.com/flights/Airport/ANC | 0.5325
http://anzograph.com/flights/Airport/DEN | 0.5325
10 rows

Betweenness Centrality

Betweenness centrality is a measure of the amount of influence a vertex has over the flow of information in a graph. The amount of influence is determined by the number of shortest paths that pass through a vertex. The Betweenness Centrality algorithm computes the shortest path between each pair of vertices in a graph and assigns a score to each vertex based on the number of shortest paths that intersect it. Vertices that frequently lie on the shortest paths between other nodes have higher betweenness centrality scores.

Syntax

To incorporate the Betweenness Centrality algorithm in a query, include the following SERVICE call in the WHERE clause.

SERVICE <csi:betweenness> 
  {
    [] <csi:binding-vertex> ?binding_vertex_variable ;
       <csi:binding-centrality> ?centrality_variable ;
       <csi:edge-label> <edge_uri> .
}
Argument Range Description
binding-vertex variable Required argument that defines the name to use for the result column that lists the source nodes (vertices).
binding-centrality variable Required argument that defines the name to use for the result column that lists the computed centrality score.
edge-label URI Required argument that lists the edge URI that defines the graph to operate on. The graph is the set of vertices that are connected by this URI.

Example

The example below uses the Betweenness Centrality algorithm to find the 10 airports with the highest centrality. The edge to operate on is defined as the hasRouteTo URI, which links the airport vertices.

prefix : <http://anzograph.com/data#>
prefix fl: <http://anzograph.com/flights/>
prefix owl: <http://www.w3.org/2002/07/owl#>
prefix skos: <http://www.w3.org/2004/02/skos/core#>

SELECT *
FROM <http://anzograph.com/airline_flight_network>
WHERE
  { 
    SERVICE <csi:betweenness> 
    {
        [] <csi:binding-vertex> ?airport ;
           <csi:binding-centrality> ?centrality ;
           <csi:edge-label> <http://anzograph.com/flights/hasRouteTo> .
    }
}
ORDER BY desc(?centrality)
LIMIT 10
airport                                  | centrality
-----------------------------------------+--------------
http://anzograph.com/flights/Airport/ORD | 18985.371039
http://anzograph.com/flights/Airport/DFW | 17718.273941
http://anzograph.com/flights/Airport/ATL | 16286.345255
http://anzograph.com/flights/Airport/DEN | 12421.967745
http://anzograph.com/flights/Airport/SLC |  8304.111011
http://anzograph.com/flights/Airport/MSP |  6835.823598
http://anzograph.com/flights/Airport/IAH |  6022.854234
http://anzograph.com/flights/Airport/SEA |  4895.147059
http://anzograph.com/flights/Airport/SFO |  4696.686488
http://anzograph.com/flights/Airport/DTW |  3978.338284
10 rows