Aggregate Functions
This topic describes the aggregate functions in AnzoGraph DB. For information about window aggregates, see Window Aggregate and Ranking Functions.
A GROUP BY statement is required for queries that contain aggregate functions if the results clause lists non-aggregate variables. Include all non-aggregated variables in the GROUP BY statement.
- AVG: Calculates the average (arithmetic mean) value for a group of numbers.
- CHOOSE_BY_MAX: Returns the value from a group that corresponds to the maximum value from another group.
- CHOOSE_BY_MIN: Returns the value from a group that corresponds to the minimum value from another group.
- COUNT: Counts the number of values that exist for a group.
- GROUP_CONCAT: Concatenates a group of strings into a single string.
- MAX: Returns the maximum value from a group of values.
- MEDIAN: Returns the median number out of a group of numbers.
- MIN: Returns the minimum value from a group of values.
- MODE: Returns the mode (the value that occurs most frequently) from a group of values.
- MODE_PERCENT: Calculates the percentage of values in a group that belong to the mode.
- SAMPLE: Returns an arbitrary value from the specified group of values.
- SUM: Calculates the sum of the numbers within a group.
- VAR: Calculates the unbiased (sample) variance of a group of numbers.
- VARP: Calculates the biased (population) variance of a group of numbers.
AVG
This function calculates the average (arithmetic mean) value for a group of numbers.
Syntax
AVG(number)
Argument |
Data Type |
Description |
number |
numeric |
The numeric value for which to calculate the average. |
Returns
Data Type |
Description |
number |
The arithmetic mean of the input values. |
Examples
The following example queries the sample Tickit data set to determine the average number of seats in the venues in each state. Since the results clause contains a non-aggregated variable (?state), a GROUP BY clause is required for grouping on ?state.
PREFIX tickit: <http://anzograph.com/tickit/>
SELECT ?state (ROUND(AVG(?seats)) AS ?avg_seats)
FROM <http://anzograph.com/tickit>
WHERE {
?s tickit:venuestate ?state .
?s tickit:venueseats ?seats .
}
GROUP BY ?state
ORDER BY ?state
state | avg_seats
------+-----------
CA | 50309
CO | 63285
DC | 41888
FL | 62603
GA | 60620
IL | 48244
IN | 63000
LA | 72000
MA | 54342
MD | 70229
MI | 53391
MN | 64035
MO | 59217
NC | 73298
NJ | 80242
NY | 48764
OH | 56035
ON | 50516
PA | 53931
TN | 68804
TX | 56915
WA | 57058
WI | 57561
23 rows
The query below calculates the average total price for all of the listings in the sample Tickit data set:
PREFIX tickit: <http://anzograph.com/tickit/>
SELECT (AVG(?numtickets*?priceperticket) AS ?avg_total_price)
FROM <http://anzograph.com/tickit>
WHERE {
?listing tickit:priceperticket ?priceperticket .
?listing tickit:numtickets ?numtickets .
}
avg_total_price
-----------------
3034.42
1 rows
CHOOSE_BY_MAX
This function calculates the maximum value for one group and returns the value from another group that corresponds to the maximum from the first group.
Syntax
CHOOSE_BY_MAX(test, value)
Argument |
Data Type |
Description |
test |
any type |
The group of values from which to find the maximum value. |
value |
any type |
The group of values from which to return the value that corresponds to the maximum value of test . |
Returns
Data Type |
Description |
input type |
The value from the value group that corresponds to the maximum value from the test group. |
CHOOSE_BY_MIN
This function calculates the minimum value for one group and returns the value from another group that corresponds to the minimum from the first group.
Syntax
CHOOSE_BY_MIN(test, value)
Argument |
Data Type |
Description |
test |
any type |
The group of values from which to find the minimum value. |
value |
any type |
The group of values from which to return the value that corresponds to the minimum value of test . |
Returns
Data Type |
Description |
input type |
The value from the value group that corresponds to the minimum value from the test group. |
COUNT
This function counts the number of values that exist for a group.
Syntax
COUNT[ ( DISTINCT ( ] (value)
Argument |
Data Type |
Description |
DISTINCT |
N/A |
Include the optional DISTINCT keyword to limit the results to the unique values. |
value |
any type |
The group of values to count. |
Returns
Data Type |
Description |
long |
The number of values in the group. |
Example
The following example queries the sample Tickit data set to count the number of people who have the same last name:
SELECT ?lastname (COUNT(?person) AS ?count)
FROM <http://anzograph.com/tickit>
WHERE {
?person <http://anzograph.com/tickit/lastname> ?lastname .
}
GROUP BY ?lastname
ORDER BY desc(?count)
LIMIT 10
lastname | count
----------+-------
Harding | 72
Ashley | 70
Stein | 70
Mason | 70
Fuentes | 69
Christian | 69
Murphy | 69
Madden | 69
Clements | 68
Chandler | 68
10 rows
GROUP_CONCAT
This function concatenates a group of strings into a single string.
Syntax
GROUP_CONCAT (group ; [ SEPARATOR = "separator_char" ] ; [ ROW_LIMIT = max_rows ] ;
[ PRE = "prefix" ] ; [ VALUE_SERIALIZE = serialize ] ; [ DELIMIT_BLANKS = separate_blanks ] ;
[ MAX_LENGTH = string_length ] ; [ SUFFIX = "suffix" ])
group
|
string |
The group of strings to concatenate. |
separator_char
|
string |
Optional argument that defines the separator to use between the values in returned strings. When SEPARATOR is omitted, AnzoGraph DB separates values with a space. |
max_rows
|
int |
Optional argument that puts a maximum limit on the number of rows to retrieve for the group. When ROW_LIMIT is omitted, the default is unlimited . Note that AnzoGraph DB performs the GROUP_CONCAT for each slice separately and combines the results from each slice. The ROW_LIMIT is applied to each slice, not the total result. Therefore, the total number of values that are concatenated will be larger than the specified limit, proportional to the number of slices in the cluster. |
prefix
|
string |
Optional string to add as a prefix to the resulting string. |
serialize
|
boolean |
Optional argument that indicates whether returned values should be serialized with the value's data type. When VALUE_SERIALIZE is omitted, the default is false . |
separate_blanks
|
boolean |
Optional argument that indicates whether to delimit blanks with the SEPARATOR value. When DELIMIT_BLANKS is omitted, the default is false . |
string_length
|
int |
Optional argument that limits the resulting strings to a maximum character length. AnzoGraph DB has a 2MB (~2,000,000 characters) limit on the length of strings and displays an error if GROUP_CONCAT returns a string that is longer than 2000000. When MAX_LENGTH is omitted, the default is unlimited . |
suffix
|
string |
Optional argument that defines a suffix to add to the resulting strings. When SUFFIX is omitted, AnzoGraph DB adds an empty string as the suffix. |
Returns
Data Type |
Description |
string |
The concatenated string. |
Example
The query below concatenates the list of friends for 10 people in the sample Tickit data set. Since the GROUP_CONCAT expression includes ROW_LIMIT=2, AnzoGraph DB limits the records to two for each slice (or shard) of data.
SELECT ?person (GROUP_CONCAT(?id;SEPARATOR=",";ROW_LIMIT=2) AS ?friends)
FROM <http://anzograph.com/tickit>
WHERE {
?person <http://anzograph.com/tickit/friend> ?friend .
BIND(STRAFTER(STR(?friend), "http://anzograph.com/tickit/") as ?id)
}
GROUP BY ?person
ORDER BY ?person
LIMIT 10
person | friends
----------------------------------------+-------------------------------------------------
http://anzograph.com/tickit/person1 | person2894,person20624,person33618,person47127
http://anzograph.com/tickit/person10 | person3136,person22714,person2509,person24535
http://anzograph.com/tickit/person100 | person42775,person29725,person27334,person24553
http://anzograph.com/tickit/person1000 | person19040,person39066,person2236,person9089
http://anzograph.com/tickit/person10000 | person43706,person37085,person18874,person31270
http://anzograph.com/tickit/person10001 | person3389,person44830,person4720,person307
http://anzograph.com/tickit/person10002 | person46462,person43989,person46491,person31130
http://anzograph.com/tickit/person10003 | person31544,person19595,person23460,person28465
http://anzograph.com/tickit/person10004 | person11070,person19845,person11172,person24252
http://anzograph.com/tickit/person10005 | person33888,person9467,person35761,person47709
10 rows
MAX
This function returns the maximum value from a group of values.
Syntax
MAX(value)
Argument |
Data Type |
Description |
value |
any type except boolean |
The group of values for which to return the maximum value. |
Returns
Data Type |
Description |
input type |
The maximum value from the group. |
Example
The following example queries the sample Tickit data to list the top 10 events with the highest number of tickets sold in one transaction:
PREFIX tickit: <http://anzograph.com/tickit/>
SELECT ?event (MAX(?tickets) AS ?max_tickets)
FROM <http://anzograph.com/tickit>
WHERE {
?listing tickit:numtickets ?tickets .
?listing tickit:eventid ?id .
?id tickit:eventname ?event .
}
GROUP BY ?event
ORDER BY desc(?max_tickets)
LIMIT 10
event | max_tickets
-----------------+-------------
Akon | 30
Beatles LOVE | 30
The Country Girl | 30
Sarah Brightman | 30
Jesse Lacey | 30
Spring Awakening | 30
Le Reve | 30
Das Rheingold | 30
Macbeth | 30
King Lear | 30
10 rows
MEDIAN
This function returns the median value from a group of numbers. The median is the number in the group where half of the numbers are greater than the number and half are less than the number.
Syntax
MEDIAN(number)
Argument |
Data Type |
Description |
number |
numeric |
The group of numeric values for which to calculate the median. |
Returns
Data Type |
Description |
number |
The median for the group. |
MIN
This function returns the minimum value from a group of values.
Syntax
MIN(value)
Argument |
Data Type |
Description |
value |
any type except boolean |
The group of values for which to return the minimum value. |
Returns
Data Type |
Description |
input type |
The minimum value from the group. |
Example
The following example queries the sample Tickit data to list the 10 events with the lowest price paid for tickets:
PREFIX tickit: <http://anzograph.com/tickit/>
SELECT ?event (MIN(?paid) AS ?min_paid)
FROM <http://anzograph.com/tickit>
WHERE {
?s tickit:pricepaid ?paid .
?s tickit:eventid ?id .
?id tickit:eventname ?event .
}
GROUP BY ?event
ORDER BY ?min_paid
LIMIT 10
event | min_paid
---------------------+----------
Legally Blonde | 20
Spring Awakening | 20
Green Day | 20
Keb Mo | 20
Thurgood | 20
King Lear | 20
Macbeth | 20
The Country Girl | 20
Ringo Starr | 20
August: Osage County | 20
10 rows
MODE
This function returns the mode from a group of values. The mode is the value that occurs most frequently in the group.
Syntax
MODE(value)
Argument |
Data Type |
Description |
value |
any type |
The group of values for which to return the mode. |
Returns
Data Type |
Description |
input type |
The mode from the group. |
MODE_PERCENT
This function calculates the percentage of values in a group that belong to the mode.
Syntax
MODE_PERCENT(value)
Argument |
Data Type |
Description |
value |
numeric |
The group of values for which to calculate the mode percent. |
Returns
Data Type |
Description |
double |
The percentage of values that belong to the mode. |
SAMPLE
This function returns an arbitrary value from the specified group of values.
Syntax
SAMPLE(value)
Argument |
Data Type |
Description |
value |
any type |
The group of values from which to choose a sample value. |
Returns
Data Type |
Description |
input type |
The arbitrary value from the group. |
SUM
This function calculates the sum of the numbers within a group.
Syntax
SUM(number)
Argument |
Data Type |
Description |
number |
numeric |
The group of numbers to sum. |
Returns
Data Type |
Description |
number |
The sum of the values in the group. |
Example
The following example queries the sample Tickit data set to determine the most unpopular events by returning the 10 events with the least number of ticket sales. The query uses the SUM aggregate function to calculate the total tickets for each event.
PREFIX tickit: <http://anzograph.com/tickit/>
SELECT ?event ?category (SUM(?qty) AS ?total_tickets)
FROM <http://anzograph.com/tickit>
WHERE {
?sales tickit:qtysold ?qty .
?sales tickit:eventid ?eventid .
?eventid tickit:eventname ?event .
?eventid tickit:catid ?catid .
?catid tickit:catname ?category .
}
GROUP BY ?event ?category
ORDER BY ?total_tickets
LIMIT 10
event | category | total_tickets
----------------+----------+---------------
White Christmas | Musicals | 35
Joshua Radin | Pop | 75
Martina McBride | Pop | 101
Beach Boys | Pop | 112
Linda Ronstadt | Pop | 116
Teena Marie | Pop | 124
Indigo Girls | Pop | 125
Billy Idol | Pop | 141
Mogwai | Pop | 146
Stephenie Meyer | Pop | 151
10 rows
VAR
This function calculates the unbiased (sample) variance for a group of numbers.
Syntax
VAR(value)
Argument |
Data Type |
Description |
value |
numeric |
The numeric value that defines the set of numbers for which to measure the variance. |
Returns
Data Type |
Description |
number |
The unbiased variance for the group. |
VARP
This function calculates the biased (population) variance for a group of numbers.
Syntax
VARP(value)
Argument |
Data Type |
Description |
value |
number |
The value that defines the set of numbers for which to measure the population variance. |
Returns
Data Type |
Description |
double |
The biased variance for the group. |