Window Aggregates

AnzoGraph includes support for window aggregates. These analytic functions enable you to compute aggregate values on a particular partition or window of the result set. Unlike grouped aggregate functions that group the results and return a single value, window aggregates return a value for each row in the specified window. For example, using the grouped aggregate SUM function to add up the total number of tickets sold in a year returns one value: the total number of tickets sold for the year. By using the SUM window aggregate instead, the results could be partitioned by month so that the query returns 12 values: the sum of the number of tickets sold in each month of the year.

AnzoGraph provides the following built-in window aggregate and ranking functions:

  • AVG: Calculates the average of the input expression values.
  • COUNT: Counts the number of results defined by the expression.
  • MAX: Returns the maximum of the input expression values.
  • MIN: Returns the minimum of the input expression values.
  • NTILE: Divides the rows in the partition into the specified number of ranked groups and returns the group that each value belongs to.
  • PERCENTILE: Like using NTILE(100), this function divides the rows in the partition into 100 ranked groups and returns the group that each value belongs to.
  • PRODUCT: Multiplies the input expression values and returns the product.
  • QUARTILE: Like using NTILE(4), this function divides the rows in the partition into 4 ranked groups and returns the group that each value belongs to.
  • ROW_NUMBER: Assigns unique numbers to each row in the partition.
  • SUM: Returns the sum of the input expression values.

Window Aggregate Syntax

Use the following syntax when incorporating window aggregates in queries. Each keyword or option is described below the syntax.

(FUNCTION (expression1) OVER (
[ PARTITION BY expression2 ] |
ORDER BY expression3 [ frame_clause ]
)
AS ?variable )

FUNCTION (expression1)

The window aggregate name and expression or expressions that produce the values to operate on.

OVER()

The required OVER() keyword differentiates window aggregates from other aggregate functions.

PARTITION BY expression2

This optional clause forms the groups of rows, dividing the result set into partitions. If you do not include a PARTITION BY clause, the partition becomes the entire result set identified by the WHERE clause. When PARTITION BY is included, the system calculates the specified function for the rows in each partition.

ORDER BY expression3

This required clause defines the order or sequence of rows within each partition.

frame_clause

The optional frame clause defines the window frame, which specifies the rows in a partition to combine with the current row. There are two types of window frames:

  • A fixed frame with two moving endpoints, where each row becomes the current row as the window frame slides forward in the partition. This type of frame is ideal for computing aggregations over moving time frames.
  • A resizing frame with one anchored endpoint, where one row is a fixed endpoint and the frame resizes up (preceding) or down (following). This type of frame is ideal for computing running totals.

The reference point for all window frames is the current row. The frame clause can be one of the following options:

[ ROWS ] frame_start
[ ROWS ] BETWEEN frame_start AND frame_end

where frame_start is one of the following options:

UNBOUNDED PRECEDING
positive_value PRECEDING
CURRENT ROW
positive_value FOLLOWING

and frame_end is one of the following options:

positive_value PRECEDING
CURRENT ROW
positive_value FOLLOWING
UNBOUNDED FOLLOWING

When a frame clause is not specified, the system treats the window frame as unbounded: ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.

Window Aggregate Examples

The following example queries the sample Tickit data set to return the number of events per month:

SELECT ?month (COUNT(?event) OVER (PARTITION BY ?month) AS ?events)
FROM <tickit>
WHERE {
 { SELECT ?event (MONTH(?date) AS ?month)
   WHERE {
      ?sale <dateid> ?dateid .
      ?dateid <caldate> ?date .
      ?sale <eventid> ?eventid .
      ?eventid <eventname> ?event .
  }
 }
}
ORDER BY ?month
month | events
------+--------
1     | 26117
1     | 26117
...
2     | 30911
2     | 30911
...
3     | 33932
3     | 33932
...
4     | 32196
4     | 32196
...
8     | 34315
8     | 34315
...
12    | 11648
12    | 11648
...
364953 rows

This example queries the sample Tickit data set to return the percentage of a salesperson's total sales that came from the "Gypsy" event:

SELECT ?event_name ?fname ?lname
((?dollars * 100.0/(SUM(?dollars) OVER(PARTITION BY ?event))) as ?percent_of_sales)
FROM <tickit>
WHERE {
  ?sale <eventid> ?event .
  ?event <eventname> ?event_name .
  ?sale <sellerid> ?salesperson .
  ?sale <pricepaid> ?dollars .
  ?salesperson <firstname> ?fname .
  ?salesperson <lastname> ?lname .
  FILTER(?event_name = "Gypsy").
}
ORDER BY ?event_name desc(?percent_of_sales)
event_name | fname   | lname     | percent_of_sales
-----------+---------+-----------+------------------
Gypsy      | Zoe     | Sosa      | 100.000000
Gypsy      | Xaviera | Jacobson  | 50.941458
Gypsy      | Brianna | Mcfarland | 50.507614
Gypsy      | Alexa   | Baird     | 45.792564
Gypsy      | Roanna  | Wood      | 42.040816
Gypsy      | Colette | Clay      | 36.938776
Gypsy      | Amela   | Holman    | 35.727660
Gypsy      | Aubrey  | Terrell   | 32.245681
Gypsy      | Bruno   | Griffin   | 31.813858
Gypsy      | Damian  | Berger    | 31.245923
Gypsy      | Zelenia | Woods     | 31.161644
Gypsy      | Imogene | Mclean    | 31.000483
Gypsy      | Helen   | Haney     | 30.295666
Gypsy      | Silas   | Wynn      | 28.730306
Gypsy      | Caryn   | Hester    | 28.714012
Gypsy      | Camden  | Horne     | 27.253097
Gypsy      | Adara   | Burris    | 26.809349
Gypsy      | Raya    | Fleming   | 26.790955
...
857 rows

This example queries the sample Tickit data set to return a running total of the number of tickets sold for the event "Mamma Mia!":

SELECT ?event ?month (SUM(?qty) OVER (PARTITION BY ?month ORDER BY ?event
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS ?tickets)
FROM <tickit>
WHERE {
  ?s <qtysold> ?qty .
  ?s <eventid> ?eventid .
  ?eventid <eventname> ?event .
  ?s <dateid> ?date .
  ?date <month> ?month .
  filter(?event="Mamma Mia!")
}
ORDER BY ?tickets
LIMIT 500
event      | month | tickets
-----------+-------+---------
Mamma Mia! | MAY   | 1
Mamma Mia! | JAN   | 1
Mamma Mia! | SEP   | 1
Mamma Mia! | APR   | 2
Mamma Mia! | JAN   | 2
Mamma Mia! | AUG   | 2
Mamma Mia! | JUN   | 2
Mamma Mia! | MAY   | 2
Mamma Mia! | NOV   | 2
Mamma Mia! | SEP   | 3
Mamma Mia! | NOV   | 3
Mamma Mia! | FEB   | 4
Mamma Mia! | APR   | 4
...
Mamma Mia! | JAN   | 85
Mamma Mia! | JUL   | 85
Mamma Mia! | MAR   | 85
Mamma Mia! | JUN   | 85
Mamma Mia! | OCT   | 85
Mamma Mia! | NOV   | 85
Mamma Mia! | FEB   | 86
Mamma Mia! | APR   | 86
...
500 rows
Related Topics