Window Aggregate and Ranking Functions

Window aggregate functions enable you to compute aggregate values on a particular partition or window of the result set. Unlike grouped aggregate functions that group the results and return a single value, window aggregates return a value for each row in the specified window. For example, using the grouped aggregate SUM function to add up the total number of tickets sold in a year returns one value: the total number of tickets sold for the year. By using the SUM window aggregate instead, the results could be partitioned by month so that the query returns 12 values: the sum of the number of tickets sold in each month of the year.

This topic describes the window aggregate and ranking functions in AnzoGraph DB:

  • AVG: Calculates the average value of each group of values.
  • COUNT: Counts the number of values in each group of values.
  • MAX: Calculates the maximum value of each group of values.
  • MIN: Calculates the minimum value of each group of values.
  • NTILE: Divides the rows in the partition into the specified number of ranked groups and returns the group that each value belongs to.
  • PERCENTILE: Divides the rows in the partition into 100 ranked groups and returns the group that each value belongs to.
  • PRODUCT: Calculates the product of each group of values.
  • QUARTILE: Divides the rows in the partition into four ranked groups and returns the group that each value belongs to.
  • ROW_NUMBER: Assigns unique numbers to each row in the partition.
  • SUM: Calculates the sum of each group of values.

Typographical Conventions

The following list describes the conventions used to document function syntax:

  • CAPS: Although SPARQL is case-insensitive, SPARQL keywords in this section are written in uppercase for readability.
  • [ argument ]: Brackets indicate an optional argument or keyword.
  • |: Means OR. Indicates that you can use one or more of the specified options.

AVG

This function calculates the average value of each group of values.

Syntax

(AVG(value) OVER ( 
  [ PARTITION BY partition_value ]
  [ ORDER BY order_value ]
# frame clause
  [ [ ROWS ] frame_start |
    [ ROWS ] BETWEEN frame_start AND frame_end
  ]
)
AS ?variable )
Argument Description
value Required argument that defines the group of values to operate on.
partition_value The optional PARTITION BY clause forms the groups of rows, dividing the result set into the partitions defined by the given partition_value. If you do not include PARTITION BY, the partition becomes the entire result set. When PARTITION BY is included, the function is applied to the group of rows in each partition.
order_value The optional ORDER BY clause defines the order or sequence of rows within each partition.
frame clause The reference point for all window frames is the current row. The optional frame clause further defines the frame by specifying the rows in a partition to combine with the current row. There are two types of window frames:
  • A fixed frame with two moving endpoints: Each row becomes the current row as the window frame slides forward in the partition. This type of frame is ideal for computing aggregations over moving time frames.
  • A resizing frame with one anchored endpoint: One row is a fixed endpoint and the frame resizes up (PRECEDING) or down (FOLLOWING). This type of frame is ideal for computing running totals.

The frame clause can be one of the following options:

[ ROWS ] frame_start
[ ROWS ] BETWEEN frame_start AND frame_end

When a frame clause is not included, the window frame is unbounded: ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.

frame_start The starting point of the frame. This argument can be one of the following values:
UNBOUNDED PRECEDING
positive_int PRECEDING
CURRENT ROW
positive_int FOLLOWING
frame_end The end of the frame. This argument can be one of the following values:
positive_int PRECEDING
CURRENT ROW
positive_int FOLLOWING
UNBOUNDED FOLLOWING

COUNT

This function counts the number of values in each group of values.

Syntax

(COUNT(value) OVER ( 
  [ PARTITION BY partition_value ]
  [ ORDER BY order_value ] 
# frame clause
  [ [ ROWS ] frame_start |
    [ ROWS ] BETWEEN frame_start AND frame_end
  ]
)
AS ?variable )
Argument Description
value Required argument that defines the group of values to operate on.
partition_value The optional PARTITION BY clause forms the groups of rows, dividing the result set into the partitions defined by the given partition_value. If you do not include PARTITION BY, the partition becomes the entire result set. When PARTITION BY is included, the function is applied to the group of rows in each partition.
order_value The optional ORDER BY clause defines the order or sequence of rows within each partition.
frame clause The reference point for all window frames is the current row. The optional frame clause further defines the frame by specifying the rows in a partition to combine with the current row. There are two types of window frames:
  • A fixed frame with two moving endpoints: Each row becomes the current row as the window frame slides forward in the partition. This type of frame is ideal for computing aggregations over moving time frames.
  • A resizing frame with one anchored endpoint: One row is a fixed endpoint and the frame resizes up (PRECEDING) or down (FOLLOWING). This type of frame is ideal for computing running totals.

The frame clause can be one of the following options:

[ ROWS ] frame_start
[ ROWS ] BETWEEN frame_start AND frame_end

When a frame clause is not included, the window frame is unbounded: ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.

frame_start The starting point of the frame. This argument can be one of the following values:
UNBOUNDED PRECEDING
positive_int PRECEDING
CURRENT ROW
positive_int FOLLOWING
frame_end The end of the frame. This argument can be one of the following values:
positive_int PRECEDING
CURRENT ROW
positive_int FOLLOWING
UNBOUNDED FOLLOWING

MAX

This function calculates the maximum value of each group of values.

Syntax

(MAX(value) OVER ( 
  [ PARTITION BY partition_value ]
  [ ORDER BY order_value ]
# frame clause
  [ [ ROWS ] frame_start |
    [ ROWS ] BETWEEN frame_start AND frame_end
  ]
)
AS ?variable )
Argument Description
value Required argument that defines the group of values to operate on.
partition_value The optional PARTITION BY clause forms the groups of rows, dividing the result set into the partitions defined by the given partition_value. If you do not include PARTITION BY, the partition becomes the entire result set. When PARTITION BY is included, the function is applied to the group of rows in each partition.
order_value The optional ORDER BY clause defines the order or sequence of rows within each partition.
frame clause The reference point for all window frames is the current row. The optional frame clause further defines the frame by specifying the rows in a partition to combine with the current row. There are two types of window frames:
  • A fixed frame with two moving endpoints: Each row becomes the current row as the window frame slides forward in the partition. This type of frame is ideal for computing aggregations over moving time frames.
  • A resizing frame with one anchored endpoint: One row is a fixed endpoint and the frame resizes up (PRECEDING) or down (FOLLOWING). This type of frame is ideal for computing running totals.

The frame clause can be one of the following options:

[ ROWS ] frame_start
[ ROWS ] BETWEEN frame_start AND frame_end

When a frame clause is not included, the window frame is unbounded: ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.

frame_start The starting point of the frame. This argument can be one of the following values:
UNBOUNDED PRECEDING
positive_int PRECEDING
CURRENT ROW
positive_int FOLLOWING
frame_end The end of the frame. This argument can be one of the following values:
positive_int PRECEDING
CURRENT ROW
positive_int FOLLOWING
UNBOUNDED FOLLOWING

MIN

This function calculates the minimum value of each group of values.

Syntax

(MIN(value) OVER ( 
  [ PARTITION BY partition_value ]
  [ ORDER BY order_value ]
# frame clause
  [ [ ROWS ] frame_start |
    [ ROWS ] BETWEEN frame_start AND frame_end
  ]
)
AS ?variable )
Argument Description
value Required argument that defines the group of values to operate on.
partition_value The optional PARTITION BY clause forms the groups of rows, dividing the result set into the partitions defined by the given partition_value. If you do not include PARTITION BY, the partition becomes the entire result set. When PARTITION BY is included, the function is applied to the group of rows in each partition.
order_value The optional ORDER BY clause defines the order or sequence of rows within each partition.
frame clause The reference point for all window frames is the current row. The optional frame clause further defines the frame by specifying the rows in a partition to combine with the current row. There are two types of window frames:
  • A fixed frame with two moving endpoints: Each row becomes the current row as the window frame slides forward in the partition. This type of frame is ideal for computing aggregations over moving time frames.
  • A resizing frame with one anchored endpoint: One row is a fixed endpoint and the frame resizes up (PRECEDING) or down (FOLLOWING). This type of frame is ideal for computing running totals.

The frame clause can be one of the following options:

[ ROWS ] frame_start
[ ROWS ] BETWEEN frame_start AND frame_end

When a frame clause is not included, the window frame is unbounded: ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.

frame_start The starting point of the frame. This argument can be one of the following values:
UNBOUNDED PRECEDING
positive_int PRECEDING
CURRENT ROW
positive_int FOLLOWING
frame_end The end of the frame. This argument can be one of the following values:
positive_int PRECEDING
CURRENT ROW
positive_int FOLLOWING
UNBOUNDED FOLLOWING

NTILE

This function divides the rows in the partition into the specified number of ranked groups and returns the group that each value belongs to.

Syntax

(NTILE(number_of_groups) OVER ( 
  [ PARTITION BY partition_value ]
  [ ORDER BY order_value ]
)
AS ?variable )
Argument Description
number_of_groups Required argument that defines the number of ranking groups.
partition_value The optional PARTITION BY clause forms the groups of rows, dividing the result set into the partitions defined by the given partition_value. If you do not include PARTITION BY, the partition becomes the entire result set. When PARTITION BY is included, the function is applied to the group of rows in each partition.
order_value The optional ORDER BY clause defines the order or sequence of rows within each partition.

PERCENTILE

This function divides the rows in the partition into 100 ranked groups and returns the group that each value belongs to.

Syntax

(PERCENTILE(value) OVER ( 
  [ PARTITION BY partition_value ]
  [ ORDER BY order_value ]
)
AS ?variable )
Argument Description
value Required argument that defines the group of values to operate on.
partition_value The optional PARTITION BY clause forms the groups of rows, dividing the result set into the partitions defined by the given partition_value. If you do not include PARTITION BY, the partition becomes the entire result set. When PARTITION BY is included, the function is applied to the group of rows in each partition.
order_value The optional ORDER BY clause defines the order or sequence of rows within each partition.

PRODUCT

This function calculates the product of each group of values.

Syntax

(PRODUCT(value) OVER ( 
  [ PARTITION BY partition_value ]
  [ ORDER BY order_value ] 
# frame clause
  [ [ ROWS ] frame_start |
    [ ROWS ] BETWEEN frame_start AND frame_end
  ]
)
AS ?variable )
Argument Description
value Required argument that defines the group of values to operate on.
partition_value The optional PARTITION BY clause forms the groups of rows, dividing the result set into the partitions defined by the given partition_value. If you do not include PARTITION BY, the partition becomes the entire result set. When PARTITION BY is included, the function is applied to the group of rows in each partition.
order_value The optional ORDER BY clause defines the order or sequence of rows within each partition.
frame clause The reference point for all window frames is the current row. The optional frame clause further defines the frame by specifying the rows in a partition to combine with the current row. There are two types of window frames:
  • A fixed frame with two moving endpoints: Each row becomes the current row as the window frame slides forward in the partition. This type of frame is ideal for computing aggregations over moving time frames.
  • A resizing frame with one anchored endpoint: One row is a fixed endpoint and the frame resizes up (PRECEDING) or down (FOLLOWING). This type of frame is ideal for computing running totals.

The frame clause can be one of the following options:

[ ROWS ] frame_start
[ ROWS ] BETWEEN frame_start AND frame_end

When a frame clause is not included, the window frame is unbounded: ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.

frame_start The starting point of the frame. This argument can be one of the following values:
UNBOUNDED PRECEDING
positive_int PRECEDING
CURRENT ROW
positive_int FOLLOWING
frame_end The end of the frame. This argument can be one of the following values:
positive_int PRECEDING
CURRENT ROW
positive_int FOLLOWING
UNBOUNDED FOLLOWING

QUARTILE

This function divides the rows in the partition into four ranked groups and returns the group that each value belongs to.

Syntax

(QUARTILE(value) OVER ( 
  [ PARTITION BY partition_value ]
  [ ORDER BY order_value ]
)
AS ?variable )
Argument Description
value Required argument that defines the group of values to operate on.
partition_value The optional PARTITION BY clause forms the groups of rows, dividing the result set into the partitions defined by the given partition_value. If you do not include PARTITION BY, the partition becomes the entire result set. When PARTITION BY is included, the function is applied to the group of rows in each partition.
order_value The optional ORDER BY clause defines the order or sequence of rows within each partition.

ROW_NUMBER

This function assigns unique numbers to each row in the partition.

Syntax

(ROW_NUMBER() OVER ( 
  [ PARTITION BY partition_value ]
  [ ORDER BY order_value ]
)
AS ?variable )
Argument Description
partition_value The optional PARTITION BY clause forms the groups of rows, dividing the result set into the partitions defined by the given partition_value. If you do not include PARTITION BY, the partition becomes the entire result set. When PARTITION BY is included, the function is applied to the group of rows in each partition.
order_value The optional ORDER BY clause defines the order or sequence of rows within each partition.

SUM

This function calculates the sum of each group of values.

Syntax

(SUM(value) OVER ( 
  [ PARTITION BY partition_value ]
  [ ORDER BY order_value ]
# frame clause
  [ [ ROWS ] frame_start |
    [ ROWS ] BETWEEN frame_start AND frame_end
  ]
)
AS ?variable )
Argument Description
value Required argument that defines the group of values to operate on.
partition_value The optional PARTITION BY clause forms the groups of rows, dividing the result set into the partitions defined by the given partition_value. If you do not include PARTITION BY, the partition becomes the entire result set. When PARTITION BY is included, the function is applied to the group of rows in each partition.
order_value The optional ORDER BY clause defines the order or sequence of rows within each partition.
frame clause The reference point for all window frames is the current row. The optional frame clause further defines the frame by specifying the rows in a partition to combine with the current row. There are two types of window frames:
  • A fixed frame with two moving endpoints: Each row becomes the current row as the window frame slides forward in the partition. This type of frame is ideal for computing aggregations over moving time frames.
  • A resizing frame with one anchored endpoint: One row is a fixed endpoint and the frame resizes up (PRECEDING) or down (FOLLOWING). This type of frame is ideal for computing running totals.

The frame clause can be one of the following options:

[ ROWS ] frame_start
[ ROWS ] BETWEEN frame_start AND frame_end

When a frame clause is not included, the window frame is unbounded: ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.

frame_start The starting point of the frame. This argument can be one of the following values:
UNBOUNDED PRECEDING
positive_int PRECEDING
CURRENT ROW
positive_int FOLLOWING
frame_end The end of the frame. This argument can be one of the following values:
positive_int PRECEDING
CURRENT ROW
positive_int FOLLOWING
UNBOUNDED FOLLOWING

Examples

This example queries the sample Tickit data set to return the percentage of a salesperson's total sales that came from the "Gypsy" event:

PREFIX tickit: <http://anzograph.com/tickit/>
SELECT ?event_name ?fname ?lname
((?dollars * 100.0/(SUM(?dollars) OVER(PARTITION BY ?event))) as ?percent_of_sales)
FROM <http://anzograph.com/tickit>
WHERE {
  ?sale tickit:eventid ?event .
  ?event tickit:eventname ?event_name .
  ?sale tickit:sellerid ?salesperson .
  ?sale tickit:pricepaid ?dollars .
  ?salesperson tickit:firstname ?fname .
  ?salesperson tickit:lastname ?lname .
  FILTER(?event_name = "Gypsy").
}
ORDER BY ?event_name desc(?percent_of_sales)
event_name | fname       | lname       | percent_of_sales
-----------+-------------+-------------+------------------
Gypsy      | Zoe         | Sosa        |              100
Gypsy      | Xaviera     | Jacobson    |          50.9415
Gypsy      | Brianna     | Mcfarland   |          50.5076
Gypsy      | Alexa       | Baird       |          45.7926
Gypsy      | Roanna      | Wood        |          42.0408
Gypsy      | Colette     | Clay        |          36.9388
Gypsy      | Amela       | Holman      |          35.7277
Gypsy      | Aubrey      | Terrell     |          32.2457
Gypsy      | Bruno       | Griffin     |          31.8139
Gypsy      | Damian      | Berger      |          31.2459
Gypsy      | Zelenia     | Woods       |          31.1616
Gypsy      | Imogene     | Mclean      |          31.0005
...
857 rows

This example queries the sample Tickit data set to return a running total of the number of tickets sold for the event "Mamma Mia!":

PREFIX tickit: <http://anzograph.com/tickit/>
SELECT ?event ?month (SUM(?qty) OVER (PARTITION BY ?month ORDER BY ?event
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS ?tickets)
FROM <http://anzograph.com/tickit>
WHERE {
  ?s tickit:qtysold ?qty .
  ?s tickit:eventid ?eventid .
  ?eventid tickit:eventname ?event .
  ?s tickit:dateid ?date .
  ?date tickit:month ?month .
  filter(?event="Mamma Mia!")
}
ORDER BY ?tickets
LIMIT 100
event      | month | tickets
-----------+-------+---------
Mamma Mia! | FEB   |       1
Mamma Mia! | AUG   |       1
Mamma Mia! | MAR   |       1
Mamma Mia! | MAY   |       1
Mamma Mia! | JUN   |       2
Mamma Mia! | SEP   |       2
Mamma Mia! | AUG   |       2
Mamma Mia! | DEC   |       2
Mamma Mia! | NOV   |       2
Mamma Mia! | OCT   |       2
Mamma Mia! | JAN   |       2
Mamma Mia! | MAR   |       2
Mamma Mia! | NOV   |       3
Mamma Mia! | JAN   |       3
Mamma Mia! | JUN   |       4
Mamma Mia! | JUL   |       4
Mamma Mia! | SEP   |       4
...
100 rows