Centrality Algorithms
Centrality algorithms identify important nodes in a graph:
- PageRank: Ranks the nodes in a graph by their relative importance or influence. Google uses PageRank to rank websites in their search engine results.
- Betweenness Centrality: Detects the amount of influence a vertex has over the flow of information in a graph.
PageRank
The PageRank algorithm ranks the nodes in a graph by their relative importance or influence. PageRank determines each node's ranking by identifying the number of links to the node, the outbound links from the node, and the quality of the links. The quality of a link is determined by the importance (PageRank) of the connected nodes. For labeled property graphs, the PageRank algorithm accepts an edge property that can be considered a relationship weight to factor into the PageRank calculation.
Syntax
The PageRank algorithm is available in the graphalgo
extension library (http://cambridgesemantics.com/anzograph/graphalgo#page_rank
) and is implemented as a procedure. To incorporate the PageRank algorithm in a query, use the following syntax in the FROM clause. The arguments that are links are described below.
SELECT triple_patterns_and_expressions # The FROM clause lists the URI for the page_rank alogrithm and # includes in parentheses the input parameters for the algorithm. FROM <http://cambridgesemantics.com/anzograph/graphalgo#page_rank> ( <graph_URI>, <edge_URI>, [ <weighted_property>, ] [ damping_factor, ] [ max_iterations, ] [ error_tolerance, ] [ normalized ] ) WHERE { ... }
Argument | Range | Description |
---|---|---|
graph_URI | URI | The URI of the target graph. |
edge_URI | URI | The URI of the edge that connects the nodes to rank. |
weighted_property | URI | Optional argument that defines the URI of the relationship property on the edge_URI that contains values to be factored in as a weight when calculating the importance of the nodes. When this property is omitted, page_rank is unweighted. |
damping_factor | 0.0 - 1.0 | Optional argument that represents the edge traversal probability. The damping factor is an estimate of the probability that a user will stay on the page rather than follow the link (edge). The damping_factor value is subtracted from 1.0 in the calculation. The default value is 0.85 . |
max_iterations | 1 - 100 | Optional argument that specifies the maximum number of times to iterate through the graph to adjust approximate PageRank values. The default value is 40 . |
error_tolerance | 0.0 - 0.1 | Optional argument that specifies the error tolerance to use. If the sum of the error values for all nodes is below this tolerance value, PageRank iterations are stopped. The default value is 1e-8 . |
normalized | boolean | Optional argument that controls whether to produce PageRank values that are between 0 and 1. The default value is false . |
Examples
The example below uses the unweighted PageRank algorithm to find the 10 most connected airports. The edge to operate on is defined as the hasRouteTo URI, which links the airport nodes.
SELECT ?airport ?rank FROM <http://cambridgesemantics.com/anzograph/graphalgo#page_rank>( <http://anzograph.com/airline_flight_network>, <http://anzograph.com/flights/hasRouteTo> ) WHERE { ?airport ?p ?rank . } ORDER BY desc(?rank) LIMIT 10
The results show that ORD (Chicago O'Hare) has the highest PageRank. It has the highest number of links to other airports.
airport | rank -----------------------------------------+--------- http://anzograph.com/flights/Airport/ORD | 13.4122 http://anzograph.com/flights/Airport/DFW | 13.1632 http://anzograph.com/flights/Airport/ATL | 12.8073 http://anzograph.com/flights/Airport/DEN | 10.3813 http://anzograph.com/flights/Airport/IAH | 8.38149 http://anzograph.com/flights/Airport/SLC | 6.90182 http://anzograph.com/flights/Airport/MSP | 6.24396 http://anzograph.com/flights/Airport/SFO | 5.50728 http://anzograph.com/flights/Airport/PHX | 5.27483 http://anzograph.com/flights/Airport/LAX | 5.24546 10 rows
By including the edge property, <http://anzograph.com/flights/distanceMiles>
, the example below uses the weighted PageRank algorithm to find the 10 most connected airports where distance between the airports is factored into the calculation.
SELECT ?airport ?rank FROM <http://cambridgesemantics.com/anzograph/graphalgo#page_rank>( <http://anzograph.com/airline_flight_network>, <http://anzograph.com/flights/hasRouteTo>, <http://anzograph.com/flights/distanceMiles> ) WHERE { ?airport ?p ?rank . } ORDER BY desc(?rank) LIMIT 10
airport | rank -----------------------------------------+-------- http://anzograph.com/flights/Airport/ORD | 2.445 http://anzograph.com/flights/Airport/ATL | 1.935 http://anzograph.com/flights/Airport/DFW | 1.68 http://anzograph.com/flights/Airport/SLC | 1.17 http://anzograph.com/flights/Airport/MSP | 0.915 http://anzograph.com/flights/Airport/DTW | 0.66 http://anzograph.com/flights/Airport/SFO | 0.66 http://anzograph.com/flights/Airport/FLL | 0.5325 http://anzograph.com/flights/Airport/ANC | 0.5325 http://anzograph.com/flights/Airport/DEN | 0.5325 10 rows
Betweenness Centrality
Betweenness centrality is a measure of the amount of influence a vertex has over the flow of information in a graph. The amount of influence is determined by the number of shortest paths that pass through a vertex. The Betweenness Centrality algorithm computes the shortest path between each pair of vertices in a graph and assigns a score to each vertex based on the number of shortest paths that intersect it. Vertices that frequently lie on the shortest paths between other nodes have higher betweenness centrality scores.
Syntax
To incorporate the Betweenness Centrality algorithm in a query, include the following SERVICE call in the WHERE clause.
SERVICE <csi:betweenness> { [] <csi:binding-vertex> ?binding_vertex_variable ; <csi:binding-centrality> ?centrality_variable ; <csi:edge-label> <edge_uri> . }
Example
The example below uses the Betweenness Centrality algorithm to find the 10 airports with the highest centrality. The edge to operate on is defined as the hasRouteTo
URI, which links the airport vertices.
prefix : <http://anzograph.com/data#> prefix fl: <http://anzograph.com/flights/> prefix owl: <http://www.w3.org/2002/07/owl#> prefix skos: <http://www.w3.org/2004/02/skos/core#> SELECT * FROM <http://anzograph.com/airline_flight_network> WHERE { SERVICE <csi:betweenness> { [] <csi:binding-vertex> ?airport ; <csi:binding-centrality> ?centrality ; <csi:edge-label> <http://anzograph.com/flights/hasRouteTo> . } } ORDER BY desc(?centrality) LIMIT 10
airport | centrality -----------------------------------------+-------------- http://anzograph.com/flights/Airport/ORD | 18985.371039 http://anzograph.com/flights/Airport/DFW | 17718.273941 http://anzograph.com/flights/Airport/ATL | 16286.345255 http://anzograph.com/flights/Airport/DEN | 12421.967745 http://anzograph.com/flights/Airport/SLC | 8304.111011 http://anzograph.com/flights/Airport/MSP | 6835.823598 http://anzograph.com/flights/Airport/IAH | 6022.854234 http://anzograph.com/flights/Airport/SEA | 4895.147059 http://anzograph.com/flights/Airport/SFO | 4696.686488 http://anzograph.com/flights/Airport/DTW | 3978.338284 10 rows