Aggregate Functions

This topic describes the aggregate functions in AnzoGraph DB. For information about window aggregates, see Window Aggregate and Ranking Functions.

A GROUP BY statement is required for queries that contain aggregate functions if the results clause lists non-aggregate variables. Include all non-aggregated variables in the GROUP BY statement.

  • AVG: Calculates the average (arithmetic mean) value for a group of numbers.
  • CHOOSE_BY_MAX: Returns the value from a group that corresponds to the maximum value from another group.
  • CHOOSE_BY_MIN: Returns the value from a group that corresponds to the minimum value from another group.
  • COUNT: Counts the number of values that exist for a group.
  • GROUP_CONCAT: Concatenates a group of strings into a single string.
  • MAX: Returns the maximum value from a group of values.
  • MEDIAN: Returns the median number out of a group of numbers.
  • MIN: Returns the minimum value from a group of values.
  • MODE: Returns the mode (the value that occurs most frequently) from a group of values.
  • MODE_PERCENT: Calculates the percentage of values in a group that belong to the mode.
  • SAMPLE: Returns an arbitrary value from the specified group of values.
  • SUM: Calculates the sum of the numbers within a group.
  • VAR: Calculates the unbiased (sample) variance of a group of numbers.
  • VARP: Calculates the biased (population) variance of a group of numbers.

AVG

This function calculates the average (arithmetic mean) value for a group of numbers.

Syntax

AVG(number)
Argument Data Type Description
number numeric The numeric value for which to calculate the average.

Returns

Data Type Description
number The arithmetic mean of the input values.

Examples

The following example queries the sample Tickit data set to determine the average number of seats in the venues in each state. Since the results clause contains a non-aggregated variable (?state), a GROUP BY clause is required for grouping on ?state.

PREFIX tickit: <http://anzograph.com/tickit/>
SELECT ?state (ROUND(AVG(?seats)) AS ?avg_seats)
FROM <http://anzograph.com/tickit>
WHERE {
  ?s tickit:venuestate ?state .
  ?s tickit:venueseats ?seats .
}
GROUP BY ?state
ORDER BY ?state
state | avg_seats
------+-----------
CA    |     50309
CO    |     63285
DC    |     41888
FL    |     62603
GA    |     60620
IL    |     48244
IN    |     63000
LA    |     72000
MA    |     54342
MD    |     70229
MI    |     53391
MN    |     64035
MO    |     59217
NC    |     73298
NJ    |     80242
NY    |     48764
OH    |     56035
ON    |     50516
PA    |     53931
TN    |     68804
TX    |     56915
WA    |     57058
WI    |     57561
23 rows

The query below calculates the average total price for all of the listings in the sample Tickit data set:

PREFIX tickit: <http://anzograph.com/tickit/>
SELECT (AVG(?numtickets*?priceperticket) AS ?avg_total_price)
FROM <http://anzograph.com/tickit>
WHERE {
  ?listing tickit:priceperticket ?priceperticket . 
  ?listing tickit:numtickets ?numtickets .
}
avg_total_price
-----------------
3034.42
1 rows

CHOOSE_BY_MAX

This function calculates the maximum value for one group and returns the value from another group that corresponds to the maximum from the first group.

Syntax

CHOOSE_BY_MAX(test, value)
Argument Data Type Description
test any type The group of values from which to find the maximum value.
value any type The group of values from which to return the value that corresponds to the maximum value of test.

Returns

Data Type Description
input type The value from the value group that corresponds to the maximum value from the test group.

CHOOSE_BY_MIN

This function calculates the minimum value for one group and returns the value from another group that corresponds to the minimum from the first group.

Syntax

CHOOSE_BY_MIN(test, value)
Argument Data Type Description
test any type The group of values from which to find the minimum value.
value any type The group of values from which to return the value that corresponds to the minimum value of test.

Returns

Data Type Description
input type The value from the value group that corresponds to the minimum value from the test group.

COUNT

This function counts the number of values that exist for a group.

Syntax

COUNT[ ( DISTINCT ( ] (value)
Argument Data Type Description
DISTINCT N/A Include the optional DISTINCT keyword to limit the results to the unique values.
value any type The group of values to count.

Returns

Data Type Description
long The number of values in the group.

Example

The following example queries the sample Tickit data set to count the number of people who have the same last name:

SELECT ?lastname (COUNT(?person) AS ?count)
FROM <http://anzograph.com/tickit>
WHERE {
  ?person <http://anzograph.com/tickit/lastname> ?lastname .
}
GROUP BY ?lastname
ORDER BY desc(?count)
LIMIT 10
lastname  | count
----------+-------
Harding   |    72
Ashley    |    70
Stein     |    70
Mason     |    70
Fuentes   |    69
Christian |    69
Murphy    |    69
Madden    |    69
Clements  |    68
Chandler  |    68
10 rows

GROUP_CONCAT

This function concatenates a group of strings into a single string.

Syntax

GROUP_CONCAT (group ; [ SEPARATOR = "separator_char" ] ; [ ROW_LIMIT = max_rows ] ;
[ PRE = "prefix" ] ; [ VALUE_SERIALIZE = serialize ] ; [ DELIMIT_BLANKS = separate_blanks ] ;
[ MAX_LENGTH = string_length ] ; [ SUFFIX = "suffix" ])
Argument Data Type Description
group string The group of strings to concatenate.
separator_char string Optional argument that defines the separator to use between the values in returned strings. When SEPARATOR is omitted, AnzoGraph DB separates values with a space.
max_rows int Optional argument that puts a maximum limit on the number of rows to retrieve for the group. When ROW_LIMIT is omitted, the default is unlimited. Note that AnzoGraph DB performs the GROUP_CONCAT for each slice separately and combines the results from each slice. The ROW_LIMIT is applied to each slice, not the total result. Therefore, the total number of values that are concatenated will be larger than the specified limit, proportional to the number of slices in the cluster.
prefix string Optional string to add as a prefix to the resulting string.
serialize boolean Optional argument that indicates whether returned values should be serialized with the value's data type. When VALUE_SERIALIZE is omitted, the default is false.
separate_blanks boolean Optional argument that indicates whether to delimit blanks with the SEPARATOR value. When DELIMIT_BLANKS is omitted, the default is false.
string_length int Optional argument that limits the resulting strings to a maximum character length. AnzoGraph DB has a 2MB (~2,000,000 characters) limit on the length of strings and displays an error if GROUP_CONCAT returns a string that is longer than 2000000. When MAX_LENGTH is omitted, the default is unlimited.
suffix string Optional argument that defines a suffix to add to the resulting strings. When SUFFIX is omitted, AnzoGraph DB adds an empty string as the suffix.

Returns

Data Type Description
string The concatenated string.

Example

The query below concatenates the list of friends for 10 people in the sample Tickit data set. Since the GROUP_CONCAT expression includes ROW_LIMIT=2, AnzoGraph DB limits the records to two for each slice (or shard) of data.

SELECT ?person (GROUP_CONCAT(?id;SEPARATOR=",";ROW_LIMIT=2) AS ?friends)
FROM <http://anzograph.com/tickit>
WHERE { 
  ?person <http://anzograph.com/tickit/friend> ?friend .
  BIND(STRAFTER(STR(?friend), "http://anzograph.com/tickit/") as ?id)
}
GROUP BY ?person
ORDER BY ?person
LIMIT 10
 person                                 | friends
----------------------------------------+-------------------------------------------------
http://anzograph.com/tickit/person1     | person2894,person20624,person33618,person47127
http://anzograph.com/tickit/person10    | person3136,person22714,person2509,person24535
http://anzograph.com/tickit/person100   | person42775,person29725,person27334,person24553
http://anzograph.com/tickit/person1000  | person19040,person39066,person2236,person9089
http://anzograph.com/tickit/person10000 | person43706,person37085,person18874,person31270
http://anzograph.com/tickit/person10001 | person3389,person44830,person4720,person307
http://anzograph.com/tickit/person10002 | person46462,person43989,person46491,person31130
http://anzograph.com/tickit/person10003 | person31544,person19595,person23460,person28465
http://anzograph.com/tickit/person10004 | person11070,person19845,person11172,person24252
http://anzograph.com/tickit/person10005 | person33888,person9467,person35761,person47709
10 rows

MAX

This function returns the maximum value from a group of values.

Syntax

MAX(value)
Argument Data Type Description
value any type except boolean The group of values for which to return the maximum value.

Returns

Data Type Description
input type The maximum value from the group.

Example

The following example queries the sample Tickit data to list the top 10 events with the highest number of tickets sold in one transaction:

PREFIX tickit: <http://anzograph.com/tickit/>
SELECT ?event (MAX(?tickets) AS ?max_tickets)
FROM <http://anzograph.com/tickit>
WHERE {
  ?listing tickit:numtickets ?tickets .
  ?listing tickit:eventid ?id .
  ?id tickit:eventname ?event .
}
GROUP BY ?event
ORDER BY desc(?max_tickets)
LIMIT 10
event            | max_tickets
-----------------+-------------
Akon             |          30
Beatles LOVE     |          30
The Country Girl |          30
Sarah Brightman  |          30
Jesse Lacey      |          30
Spring Awakening |          30
Le Reve          |          30
Das Rheingold    |          30
Macbeth          |          30
King Lear        |          30
10 rows

MEDIAN

This function returns the median value from a group of numbers. The median is the number in the group where half of the numbers are greater than the number and half are less than the number.

Syntax

MEDIAN(number)
Argument Data Type Description
number numeric The group of numeric values for which to calculate the median.

Returns

Data Type Description
number The median for the group.

MIN

This function returns the minimum value from a group of values.

Syntax

MIN(value)
Argument Data Type Description
value any type except boolean The group of values for which to return the minimum value.

Returns

Data Type Description
input type The minimum value from the group.

Example

The following example queries the sample Tickit data to list the 10 events with the lowest price paid for tickets:

PREFIX tickit: <http://anzograph.com/tickit/>
SELECT ?event (MIN(?paid) AS ?min_paid)
FROM <http://anzograph.com/tickit>
WHERE {
  ?s tickit:pricepaid ?paid .
  ?s tickit:eventid ?id .
  ?id tickit:eventname ?event .
}
GROUP BY ?event
ORDER BY ?min_paid
LIMIT 10
event                | min_paid
---------------------+----------
Legally Blonde       |       20
Spring Awakening     |       20
Green Day            |       20
Keb Mo               |       20
Thurgood             |       20
King Lear            |       20
Macbeth              |       20
The Country Girl     |       20
Ringo Starr          |       20
August: Osage County |       20
10 rows

MODE

This function returns the mode from a group of values. The mode is the value that occurs most frequently in the group.

Syntax

MODE(value)
Argument Data Type Description
value any type The group of values for which to return the mode.

Returns

Data Type Description
input type The mode from the group.

MODE_PERCENT

This function calculates the percentage of values in a group that belong to the mode.

Syntax

MODE_PERCENT(value)
Argument Data Type Description
value numeric The group of values for which to calculate the mode percent.

Returns

Data Type Description
double The percentage of values that belong to the mode.

SAMPLE

This function returns an arbitrary value from the specified group of values.

Syntax

SAMPLE(value)
Argument Data Type Description
value any type The group of values from which to choose a sample value.

Returns

Data Type Description
input type The arbitrary value from the group.

SUM

This function calculates the sum of the numbers within a group.

Syntax

SUM(number)
Argument Data Type Description
number numeric The group of numbers to sum.

Returns

Data Type Description
number The sum of the values in the group.

Example

The following example queries the sample Tickit data set to determine the most unpopular events by returning the 10 events with the least number of ticket sales. The query uses the SUM aggregate function to calculate the total tickets for each event.

PREFIX tickit: <http://anzograph.com/tickit/>
SELECT ?event ?category (SUM(?qty) AS ?total_tickets)
FROM <http://anzograph.com/tickit>
WHERE {
  ?sales tickit:qtysold ?qty .
  ?sales tickit:eventid ?eventid .
  ?eventid tickit:eventname ?event .
  ?eventid tickit:catid ?catid .
  ?catid tickit:catname ?category .
} 
GROUP BY ?event ?category
ORDER BY ?total_tickets
LIMIT 10
event           | category | total_tickets
----------------+----------+---------------
White Christmas | Musicals |            35
Joshua Radin    | Pop      |            75
Martina McBride | Pop      |           101
Beach Boys      | Pop      |           112
Linda Ronstadt  | Pop      |           116
Teena Marie     | Pop      |           124
Indigo Girls    | Pop      |           125
Billy Idol      | Pop      |           141
Mogwai          | Pop      |           146
Stephenie Meyer | Pop      |           151
10 rows

VAR

This function calculates the unbiased (sample) variance for a group of numbers.

Syntax

VAR(value)
Argument Data Type Description
value numeric The numeric value that defines the set of numbers for which to measure the variance.

Returns

Data Type Description
number The unbiased variance for the group.

VARP

This function calculates the biased (population) variance for a group of numbers.

Syntax

VARP(value)
Argument Data Type Description
value number The value that defines the set of numbers for which to measure the population variance.

Returns

Data Type Description
double The biased variance for the group.