UDX Terminology and Concepts

This topic introduces the AnzoGraph DB user-defined extensions (UDX) interface and describes fundamental terminology and concepts associated with developing custom AnzoGraph DB extensions that implement the UDX interface. Subjects covered here are the following:

Extension Types

AnzoGraph DB extensions are programs that implement the UDX interface and can be registered and loaded into the AnzoGraph DB system where they can be used within queries or other command statements. AnzoGraph DB currently supports three different kinds of extensions. Each extension has similar but distinct requirements:

  • User-Defined Functions (UDF): A UDF extension maps or processes a single row of input values to return a single row of output values. For example, a developer can design a UDF extension to create an analytic function, such as those that concatenate values or convert integers to alternate currencies.
  • User-Defined Aggregates (UDA): A UDA extension maps or processes multiple rows of input values to return a single row of output values. For example, a developer can design a UDA extension, such as those that compute an arithmetic mean, or perform operations like SUM, STDDEV, or MAX. Unlike a UDF, which returns a distinct value each time it is applied, a UDA aggregates the collection of values to which it is applied into a single summary value.
  • User-Defined Services (UDS): A UDS extension maps or processes multiple rows of input values to return multiple rows of output values. For example, a developer can design and register a UDS extension that defines a SPARQL endpoint.

Extension Libraries

Extension libraries are executable code modules that define and organize a collection of extensions. Libraries can be implemented in either C++ or any JVM-based language such as Java or Scala. Developers can create and register any number of extension libraries.

Extension Metadata

Extension libraries are self-describing; that is, they include the necessary metadata that describe the number, name, type, and calling signature of the various extensions they implement. When a new UDX is implemented, the developer adds the metadata to an extension library that describes each new UDX. When the extension library is loaded into AnzoGraph DB, the system adds the extension library metadata to an internal AnzoGraph DB registry so that the new UDX can be invoked from within subsequent SPARQL queries.

Extension Data Types

The following table describes the types of values that can be passed into and returned from a user-defined extension. For each type, we can specify:

  • Enum Type: A unique number that identifies the data type.
  • RDF Type: The name by which the type is known within the SPARQL query language.
  • C++ Type: The type by which it is known within the C++ language.
  • JVM Type: The type by which it is known within the JVM language.
  • UDX Registry Data Type: The language-independent name by which it is known within the AnzoGraph DB registry.

UDX Data Types

The following table describes mapping for the various data types that can be specified in an AnzoGraph DB user-defined extension.

The data types listed in the table describe values that can be passed into and out of a user-defined extension. In C++, we do this by placing the values into the elements of a row. In JVM languages, the values are passed on the stack as explicit parameters to the relevant UDX.

Enum Type RDF Type Description C++ Type JVM Type  
t_boolean xsd:boolean A non-nullable 8-bit boolean value bool boolean boolean
t_byte xsd:byte A non-nullable 8-bit signed integer byte/uint8_t byte byte
t_short xsd:short A non-nullable 16-bit signed integer short/int16_t short short
t_int xsd:int A non-nullable 32-bit signed integer int/int32_t int int
t_long xsd:long A non-nullable 64-bit signed integer long/int64_t long long
t_float xsd:float A non-nullable 32-bit IEE single precision float float float float
t_double xsd:double A non-nullable 64-bit IEE double precision float double double double
t_Object N/A A direct sum of all possible nullable types -- java/lang/Object Object
t_Boolean xsd:boolean A nullable 8-bit boolean value bool java/lang/Boolean Boolean
t_Byte xsd:byte A nullable 8-bit signed integer byte/uint8_t java/lang/Byte Byte
t_Short xsd:short A nullable 16-bit boolean integer short/int16_t java/lang/short Short
t_Integer xsd:int A nullable 32-bit signed integer int/int32_t java/lang/Integer Int
t_Long xsd:long A nullable 64 bit signed integer long/int64_t java/lang/Long Long
t_Float xsd:float A nullable 32-bit IEE single precision float float java/lang/Float Float
t_Double xsd:double A nullable 64-bit IEE double precision float double java/lang/Double Double
t_Date xsd:date

A nullable 32-bit signed number of days since 1/1/2000

udx2::Date

java/time/LocalDate Date
t_Time xsd:time A nullable 64-bit signed number of microseconds since 1/1/2000

udx2::Time

 

java/time/OffsetTime Time
t_DateTime xsd:dateTime A nullable <us, time zone> pair - since 1/1/2000 udx2::DateTime java/time/ZonedDateTime DateTime
t_Duration xsd:duration

A nullable <months, us> pair - since 1/1/2000

udx2::Duration

java/time/Duration Duration
t_String xsd:string

A nullable view into a string of UTF8 characters

udx2::String

java/lang/String String
t_LString xsd:string

A nullable pair of string views

udx2::LString

com/cambridgesemantics/anzograph/udx/LString LString
t_UDT N/A A nullable pair of string views

udx2::UDT

com/cambridgesemantics/anzograph/udx/UDT UDT
t_URI IRI A nullable view into a string of UTF8 characters

udx2::String

com/cambridgesemantics/anzograph/udx/URI URI
t_Blob N/A A nullable block of raw binary bytes udx2::Blob com/cambridgesemantics/anzograph/udx/Blob N/A

Data Type Handling

The illustration below provides a diagram of AnzoGraph DB's UDX data type handling. The top row in the diagram shows the built-in primitive types, and the bottom plane shows the corresponding reference types. The arrows pointing from primitive types to corresponding reference types represent automatic coercions. Details about data type processing and automatic type coercion follow the diagram.

Primitive Types

The top row in the diagram depicts non-nullable types that are native to both the C++ and JVM languages.

If a UDX registers itself as requiring a primitive type as one of its arguments, but it receives a null value at run time, the system generates an exception and the query is aborted. Similarly, if a UDX registers itself as returning a primitive type as one of its results, but it actually returns a null value, the system also generates an exception and the query is aborted.

Passing and returning values of primitive types is generally faster than using the corresponding reference types, and thus, is preferred whenever possible for best performance.

Reference Types

The reference types shown in the bottom plane of the diagram represent data values that are passed by reference. These types are ultimately derived from "Object," have methods, are instances of classes, and are interrogated at run-time for their type. Reference types are also nullable. Each primitive type (boolean, byte, short, int, long, float, double) has a corresponding reference type that it is mapped to (Boolean, Byte, Short, Integer, Long, Float, Double).

Passing and returning values as reference types is generally slower than using their primitive counterparts, but using reference types often provide more flexibility.

Data Type Coercion

AnzoGraph DB supports automatic type coercion of certain data types. These data types are represented by the downward-pointing arrows in the previous diagram showing AnzoGraph DB data type mapping. Where automatic conversion is supported, a value of one type can be supplied to a UDX where a value of another type is generally prescribed, and AnzoGraph DB will convert the data type without a loss of information or precision.

For example, if a UDX expects a Double value as an input argument and the value supplied is an int, AnzoGraph DB coerces the value as follows:

 int→long→float→double→Double 

If a UDX requires a long value, but an int is supplied, AnzoGraph DB converts the int from a 32-bit signed integer to a 64-bit signed integer 3L type and clears out the high 32 bits.