Comparability, Equality, Orderability, and Equivalence

This section describes AnzoGraph compatibility with the Cypher Language specification for equality, comparability, and orderability operations.

Cypher provides operations around four distinct concepts related to equality and ordering:

  • Comparability – Comparability is specified by the inequality operators (>, <, >=, <=), and determines how to compare two values.
  • Equality – Equality is specified by the equality operators (=, <>), and the list membership operator (IN) to determine if two values are the same. Equality is also used implicitly by literal maps in node and relationship patterns, since such literal maps provide a shorthand notation for equality predicates.

    Cypher lists and maps are not supported in the current AnzoGraph release.

  • Orderability – Orderability is specified by the ORDER BY clause and determines how to order values.
  • Equivalence – Equivalence is specified by the DISTINCT modifier and by grouping in projection clauses (WITH, RETURN) to determine if two values are the same.

Besides providing semantics for equality within the primitive types (boolean, string, integer, and float) and maps, Cypher also provides semantics for comparability and orderability for integer, float, and string values within each of the types.

Comparability and Equality (Partially Supported)

Comparability and equality are consistently aligned, that is,

expr1 = expr2 if and only if expr1 >= expr2 && expr1 <= expr2.

If comparison or equality tests involve specific values that evaluate to null, the values are said to be incomparable.

Comparability

List, Map, and path types and their associated comparisons are not supported in the current AnzoGraph release.

Comparability is defined between any pair of values, as specified below.

  • General rules
    • Values are only comparable within their most specific type (except for numbers).
    • Equal values are grouped together.
  • Numbers
    • Integers are compared numerically in ascending order.
    • Floats (excluding NaN and infinity values) are compared numerically in ascending order.
    • Numbers of different types (excluding NaN and infinity values) are compared to each other as if both numbers would have been coerced to larger precision decimal values before comparing them numerically in ascending order.
    • Positive infinity is of type FLOAT, equal to itself, and greater than any other number (excluding NaN values).
    • Negative infinity is of type FLOAT, equal to itself, and less than any other number (excluding NaN values).
    • NaN values are incomparable.
    • Numbers are not comparable with any value that is not also a number.
  • Booleans
    • Booleans are compared such that false is less than true.
    • Booleans are not comparable to any value that is not also a boolean.
  • Strings
    • Strings are compared in dictionary order, that is, characters are compared pair-wise, in ascending order, from the start of the string to the end. Characters missing in a shorter string are considered to be less than any other character. For example, 'a' < 'aa'.
    • Strings are not comparable to any value that is not also a string.
  • Implementation-specific types
    • Implementations may choose to define suitable comparability rules for values of additional, non-canonical types.
    • Values of an additional, non-canonical type are expected to be incomparable to values of a canonical type.
  • Null is not comparable with any other value (including other null values).

Equality

To align equality with comparability, the equality of lists and maps that contain null values is treated in the same way as if they would have been compared outside of those lists and maps, that is, as individual, simple values.

List and Map types are not supported in the current AnzoGraph release.

Orderability and Equivalence (Partially Supported)

Based on the Cypher language specification, orderability and equivalence are aligned such that expr1 is equivalent to expr2 if and only if they have the same position under orderability. As a result, expr1 and expr2 are sorted before or after any other non-equivalent value in the same way.

List, Map, and path types and their associated comparisons are not supported in the current AnzoGraph release.

Orderability

Orderability is defined between any pair of values such that the result is always true or false. To accomplish this, Cypher defines a pre-determined order of types and ensures that each value falls under exactly one disjoint type in this order.

List, Map, and path types and their associated comparisons are not supported in the current AnzoGraph release.

The Cypher language specification prescribes using the following ascending global sort order of disjoint types:

  • MAP types
    • Regular map
    • NODE
    • RELATIONSHIP
  • LIST OF
  • PATH
  • STRING
  • BOOLEAN
  • NUMBER

    NaN values are treated as the largest numbers in orderability, placed after any positive infinity values.

  • VOID (the type of null)

Using this global sort order, all nodes come before all strings.

The corresponding descending global sort order is the same order, in reverse. That is, the order runs from VOID to MAP. Between values of the same type in the global sort order, orderability defers to comparability, except that equality is overridden by equivalence.

The current release of AnzoGraph uses a different type order:

  • Void
  • Node/Relationship
  • Number
  • Boolean
  • String

Equivalence (Partially Supported)

Equivalence can be defined as being identical to equality except for the following:

  • Any two null values are equivalent (both directly or inside nested structures) and, similarly, any two NaN values are also equivalent (both directly or inside nested structures). However, null and NaN values are not equivalent (both directly or inside nested structures).
  • Equivalence of lists is identical to equality of lists, but it uses equivalence for comparing the contained list elements.
  • Equivalence of regular maps is identical to equality of regular maps, but it uses equivalence for comparing the contained map entries.
  • Equivalence is reflexive for all values.

List and Map types are not supported in the current AnzoGraph release.

Aggregation (Partially Supported)

An aggregation (aggr(expr)) processes all matching rows for each aggregation key found in an incoming record (where keys are compared using equivalence). For a fixed aggregation key and each matching record, expr is evaluated to a value. This yields a list of candidate values. Generally, the order of candidate values is unspecified. However, if the aggregation happens in a projection with an associated ORDER BY subclause, the list of candidate values is ordered in the same way as the underlying records and as specified by the associated ORDER BY subclause.

In a regular aggregation (that is, of the form aggr(expr)), the list of aggregated values is the list of candidate values with all null values removed from it. In a distinct aggregation (that is, aggr(DISTINCT expr)), the list of aggregated values is the list of candidate values with all null values removed from it. Furthermore, in a distinct aggregation, only one of all equivalent candidate values is included in the list of aggregated values, that is, duplicates under equivalence are removed. However, if the distinct aggregation happens in a projection with an associated ORDER BY subclause, only one element from each set of equivalent candidate values is included in the list of aggregated values.

Finally, the remaining aggregated values are processed by the actual aggregation function. If the list of aggregated values is empty, the aggregation function returns a default value (null unless otherwise specified; AnzoGraph currently returns null). Aggregating values of different types, like summing a number and a string, may lead to runtime errors.

Currently, the SUM of a number and a string will return null in AnzoGraph.

The semantics of a few actual aggregation functions depends on the determination of sameness and sorting:

  • count(expr) returns the number of aggregated values; it returns zero if the list of aggregated values is empty.
  • min/max(expr) returns the smallest and largest, respectively, of the aggregated values under orderability. Note that null values will never be returned as a maximum, as they are never included in the list of aggregated values.
  • sum(expr) returns the sum of aggregated values; it returns zero if the list of aggregated values is empty.
  • avg(expr) returns the arithmetic mean of aggregated values; it returns zero if the list of aggregated values is empty.
  • collect(expr) returns the list of aggregated values.

    The current AnzoGraph release does not support collect(expr).

  • stdev(expr) returns the standard deviation of the aggregated values (assuming they represent a random sample); it returns zero if the list of aggregated values is empty.
  • stdevp(expr) returns the standard deviation of the aggregated values (assuming they form a complete population); it returns zero if the list of aggregated values is empty.
  • percentile_disc(expr) computes the inverse distribution function (assuming a discrete distribution model); it returns zero if the list of aggregated values is empty.
  • percentile_cont(expr) computes the inverse distribution function (assuming a continous distribution model); it returns zero if the list of aggregated values is empty.
Related Topics