PageRank

The PageRank algorithm ranks the nodes in a graph by their relative importance or influence. PageRank determines each node's ranking by identifying the number of links to the node and the quality of the links. The quality of a link is determined by the importance (PageRank) of the node that presents the outbound link.

PageRank Syntax

Graph algorithms are accessed from an internal SPARQL service endpoint. To incorporate the PageRank algorithm in a query, include a SERVICE statement in the WHERE clause. The service call specifies the name of the algorithm and defines the required and optional property values for that algorithm.

Use the following syntax in the SERVICE clause to compute the PageRank for nodes in a graph. The table below describes each property.

SERVICE <csi:page_rank>
  {
    [] <csi:binding-vertex>  ?vertex_variable_name ;
       <csi:binding-rank>    ?rank_variable_name ;
       <csi:edge-label>      <edge_uri> ;
     [ <csi:graph>           <graph_uri> ; ]
     [ <csi:max-iterations>  number_of_iterations ; ]
     [ <csi:err-tolerance>   tolerance_number ; ]
     [ <csi:damping-factor>  double_value ; ]
     [ <csi:normalized>      boolean_value ]
}
Property Range Description
<csi:binding-vertex> variable Required property that defines the name to use for the column in the results that lists the source nodes or vertices.
<csi:binding-rank> variable Required property that defines the name to use for the result column that lists the computed PageRank values.
<csi:edge-label> URI Required property that lists the edge URI that defines the graph to operate on. The graph is the set of vertices that are connected by this URI.
<csi:graph> URI Optional property that specifies the graph to query.
<csi:max-iterations> 1 - 100 Optional property that specifies the maximum number of times to iterate through the graph to adjust approximate PageRank values. The default value is 50.
<csi:err-tolerance> 0.0 - 0.1 Optional property that specifies the error tolerance to use. If the sum of the error values for all nodes is below this tolerance value, AnzoGraph DB stops PageRank iterations. The default value is 1e-8.
<csi:damping-factor> 0.0 - 1.0 Optional property that specifies the edge traversal probability. When used for analyzing website data, such as when Google ranks search results, the damping factor represents click-through probability. The damping-factor value is subtracted from 1.0 in the calculation. The default value is 0.85.
<csi:normalized> boolean Optional property that specifies whether to produce PageRank values that are between 0 and 1. The default value is false.

PageRank Examples

The examples below are run against a sample flight network data set, which contains data about flights, airlines, and airport locations. For instructions on downloading the sample data and loading it to AnzoGraph DB so that you can run the example queries, see Sample Data for Graph Algorithm Examples.

The example below uses the PageRank algorithm to find the 10 most connected airports. The edge to operate on is defined as the hasRouteTo URI, which links the airport nodes.

prefix : <http://anzograph.com/data#>
prefix fl: <http://anzograph.com/flights/>
prefix owl: <http://www.w3.org/2002/07/owl#>
prefix skos:    <http://www.w3.org/2004/02/skos/core#>

SELECT ?airport ?rank
FROM <http://anzograph.com/airline_flight_network>
WHERE
{
  ?node a fl:Airport .
  ?node skos:prefLabel ?airport .
  SERVICE <csi:page_rank> 
    {
      [] <csi:binding-vertex>  ?node ;
         <csi:binding-rank>    ?rank ;
         <csi:edge-label> <http://anzograph.com/flights/hasRouteTo> .
    }
}
ORDER BY desc(?rank)
LIMIT 10

The results show that Chicago O'Hare has the highest PageRank. It has the highest number of links to other airports.

airport                                            | rank
---------------------------------------------------+---------
Chicago O'Hare International Airport               | 13.4122
Dallas Fort Worth International Airport            | 13.1632
Hartsfield Jackson Atlanta International Airport   | 12.8073
Denver International Airport                       | 10.3813
George Bush Intercontinental Houston Airport       | 8.38146
Salt Lake City International Airport               |  6.9018
Minneapolis-St Paul International/Wold-Chamberlain | 6.24394
San Francisco International Airport                | 5.50726
Phoenix Sky Harbor International Airport           | 5.27481
Los Angeles International Airport                  | 5.24544
10 rows