Reviewing UDX Interface Files

This topic provides more detailed information on the content of UDX API interface files available for UDX development in C++.

Documentation on UDX development in JVM environments is in progress and not available at this time.

Reviewing the C++ UDX Include File

The following sections describe each of the sections within the udx_api.hpp file. This file is located in the <install_path>/include directory.

Overview

The beginning portion of the udx_api.hpp file simply provides some versioning information and description of major changes among different versions of the include file. The top portion of the include file also provides additional directives to the C++ compiler regarding additional system library includes, a name space declaration of various function and variable scope, and the metadata structure that provides various version compatibility information.

Data Types

This section of the C++ include header (.hpp) file provides several different things:

  • Enumeration of available UDX data types, for example, t_Null = 0, t_boolean = 1, t_byte = 2, and so on.
  • Listing of various aliases mapped to corresponding UDX data types with the using keyword directive or declaration, for example, using blob = std::int64_t;, using Date = std::int32_t, and so on.
  • Declarations of a public DateTime() class along with data members and several different function call signatures to store and return various date, time, and timezone information.
  • Various public helper functions:
    • Duration() – calculate months or time (in microseconds) since January 1, 2000.
    • LString()
    • UDT()
    • Blob()
  • An Allocated structure that developers can use to allocate space for objects with memory managed by AnzoGraph DB
  • A Row class that stores and returns information on UDX input arguments and result shapes or schema. AnzoGraph DB uses rows to marshal values in and out of user code.
  • A namespace that provides operations to insert data values and other information, for example, schemas, rows, datetime values, and blobs, into memory,

Extension Points

This section defines different extension point interfaces that developers can implement to create different types of extensions in an UDX library file:

  • User Defined Function (UDF) looks to the user like a normal function that maps one row of input values to one row of output values.
  • User Defined Agggregate (UDA) looks to the user like a normal function that maps many rows of input values to one row of output values.
  • User Defined Service (UDS) looks to the user like a service that maps many rows of input values to many output values, or like a table that maps zero rows of input values to many output values.

An Extension struct provides a common base from which all user-defined extensions are derived.

Extension Library Meta-Data

C++ extensions are compiled and linked into extension libraries, which are modules of executable code that export a meta-data description of their contents. Formally, a C++ extension library is any Linux shared library that exports an entry point of the form:

//*
//*   extern "C" void register_extensions(MetaData &md) {
//*     md.json_metadata = R"({
//*       "name"        : <name>,
//*       "language"    : "c++",
//*       "version"     : <version>,
//*       "description" : <description>,
//*       "author"      : <author>,
//*       "copyright"   : <copyright>,
//*       "contents"    : [ <extension> * ]
//*     })";
//*   }
//*

The values of each attribute in the JSON metadata have the following meaning:

Field Description
name The name of the library; a string that distinguishes it from all other such libraries currently installed within the system.
language Hard-coded to "C++" for UDX libraries created with C++.
version The version of the library; specified as a string of the form "MAJOR.MINOR.PATCH".
description A brief description of the library and the kind of functionality that it provides. This attribute is optional.
author The author of the library.This attribute is optional.
copyright Any copyright that may pertain to the library. This attribute is optional.
contents [<extension> *] Specifies a meta-data description of each extension defined with the same source file and exported to AnzoGraph DB from the library.

Extension Meta-Data

The contents attribute of the AnzoGraph DB extension library meta-data fields enumerates JSON meta-data descriptions of each extension defined within the same extension library source file. Each type of extension point share a certain number of attributes in common:

Field Description
name The name-space qualified name of the extension as it appears to a user of the query language.
type The type of the extension point implemented: "function", "aggregate", or "service". , or "table". (Use of the "table" type is deprecated, since the same functionality can be achieved using the service extension type.)
signature The name of the exported entry point that returns an "ExtensionFactory" for the given extension.
description A brief description of the extension itself. This attribute is optional.

See Registering a UDX in an Extension Library for more information on specifying entries in a C++ source file for the extension library and all the UDXs defined within the same library source file. For examples on creating C++ UDXs of each extension type, see UDX Examples.

UDX Type Structures and Meta-Data

The remaining portions of the Extension Point section of the udx_api.hpp include file provide template class structures for the construction of every supported UDX type: Function, Aggregate, Service, and Table (deprecated). Each of the extension type classes also provide methods for processing input arguments passed to extensions or values returned from those same extensions.

Class Structure Description
struct Function : Extension The Function class represents, perhaps, the simplest of the extension points, and allows developers to extend the set of analytic functions already built into the query language seen by the end user. A function extension applies some operation to the given arguments and returns an output row result based on the function's implementation.
struct Aggregate : Extension

The Aggregate class enables developers to extend the set of aggregate functions that are already built into AnzoGraph DB for use in queries. Aggregate functions process a given row of values across ...

struct Service : Extension The Service class represents perhaps the most powerful of the extension points and ...
struct Table : Extension Deprecated.

Each of the different extension type classes has additional attributes specified as part of UDX library meta-data.

Additional Attributes for the Function Extension

In addition to those attributes common to every extension, the following attributes are specified for extension functions.

Attribute Description
arguments An array of zero or more types that specifies the number and type of arguments required by any application of the UDF. Determines the shape of the input row passed as the "a" parameter of the apply method.
results An array of zero or more types that specifies the number and type of results that are returned by any application of the UDF. Determines the shape of the output row passed as the "r" parameter of the apply method.
variadic A boolean value which indicates, if true, that the final type listed in the arguments array may be repeated one or more times in an application of the UDF. This attribute is optional.

When compiling a query including a UDF, for example, the leader verifies that the number and type of the arguments passed to the UDF are consistent with its domain, as specified by the arguments and variadic attributes of the meta-data description. When executing a query, each slice creates its own distinct instance of the UDF for every occurrence in the query by invoking the create method of the associated ExtensionFactory.

As each row of values streams through a slice, it is passed to the instance by calling its apply method, and the results are then passed up to the consumer of the stream. The factory, and the instances that it creates, are destroyed only when the query has eventually finished executing.

Additional Attributes for the Aggregate Extension

In addition to those attributes common to every extension, the Aggregate extension has the following additional meta-data attributes.

Attribute Description
arguments

An array of zero or more types that specifies the number and type of arguments required by any application of the UDF. Determines the shape of the input row passed as the 'a' parameter of the accumulate method.

results An array of zero or more types that specifies the number and type of results that are returned by any application of the UDF. Determines the shape of the output row passed as the 'r' parameter of the result method.
variadic

A boolean value which, if true, indicates that the final type listed in the arguments array may be repeated one or more times in an application of the UDA. This attribute is optional.

states

An array of zero or more types that specifies the number and type of states that are marshaled across the cluster when merging those intermediate results that accumulated on the slices during the accumulation phase.

Determines the shape of the output row passed as the "s" parameter of the save method and determines the shape of the input row passed as the "s" parameter of the merge method. This attribute is optional.

When compiling a query, the leader verifies that the number and type of the arguments passed to the UDA are consistent with its domain, as specified by the arguments and variadic attributes of the meta-data description. A slice of the cluster is designated as the receiver of the final aggregate result.

When executing a query, each slice creates its own distinct instance of the UDA for every occurrence in the query by invoking the create method of the associated ExtensionFactory.

  • As each row of values streams through a slice, it is passed to the instance by calling its accumulate method, which responds by updating its internal state as necessary to record having processed the row in some appropriate way.
  • When all rows on the slice have been accumulated, the instance is now given a mutable row (of shape states) into which it serializes any intermediate results it has accumulated, and instance is then destroyed.
  • The slice receiving the result now creates an instance of the UDA and a row (of shape states), and the system arranges for all intermediate states to be transmitted across the cluster and 'merged' into the instance by passing each in turn to its merge method.

The factory, and the instances that it creates, are destroyed only when the query has eventually finished executing.

System Information

AnzoGraph DB also provides a number of utility functions that can be invoked at any time by any UDX. These functions let you query basic system information useful in providing more exact control of a UDX's execution.

The following table provides a brief description of these functions.

Function Description

amPlanning()

Boolean; returns true if the caller is being invoked during the execution of a query on the nodes of the cluster.
amExecuting() Returns the identifier of the worker node on which the caller is executing as an integer in the range 0 to getNodes().
getNode() Returns the total number of worker nodes in the cluster.
getNodes() Returns the identifier of the slice on which the caller is executing; returns an integer in the range 0 to getSlices().
getSlice() Returns the total number of slices in the cluster.
getSlices() Returns a string of the form "<MAJOR>.<MINOR>,<PATCH>" that describes the AnzoGraph API version supported by this server.
getHostVersion() Returns the source text of the currently executing query.
getQueryText() Returns any additional data that may have been supplied by the client along with the text of the query that is currently executing. The client context string that is returned may take any form whatsoever; the server does not parse it in any way, but merely makes it available to extensions unmodified via this access function.
getClientContext() Logs a null-terminated string and an associated integer to the sth_udx system table . extension The string includes the namespace-qualified name of the extension that is logging a given message, an integer indicating the severity level of the event, and the raw text of the message to be logged. The 'level' ostensibly describes the 'severity' of the given logging event, but can in fact represent anything; the system does not interpret its value in any particular way.
logText(...) Logs a null-terminated string and an associated integer to the sth_udx system table. This variant of the log function formats any additional details into the given message template printf style before forwarding the arguments on to the logText function.
vlog(...)

Provides another variant of the logging functions.

log(...)  

Refer to the udx_api.hpp include file for addtional comments and descriptions of parameters for each of the listed functions.

Memory Allocation

Developers are encouraged to call the following functions to acquire memory for extensions from the system's free memory, thus allowing the server to monitor an extension's usage of memory, warn of possible leaks, and generally ensure the smooth running

of the system as a whole.

void *malloc(size_t);

void *calloc(size_t, size_t);

void *realloc(void *, size_t);

void free(void *);

When allocating memory for large objects, or allocating memory for objects that will outlive the current stack frame (for example, assigning to a data member of an extension), it is recommended you used one of the following options:

  • Use a standard library container that is parameterized on the new udx2::allocator:
struct agg : Aggregate {
std::vector<int,udx2::allocator<int>> m_vec = { ... };
  • Use a smart pointer:
struct agg : Aggregate {
udx2::unique_ptr<Object> = udx2::make_unique<Object>( ... );
  • call udx2::mallo, etc:

Objects with indeterminate lifetime should use one these three techniques, if at all possible. For other situations, you can continue to use small, short-lived objects as before, for example:

bool foo(const std::string& x) { ... }
bool bar () { std::vector<int> = {... }; ... }

Exceptions

Exceptions thrown by extension code are caught by the server, which aborts the currently executing query and reports the error to the user. This section provides a collection of macros that developers can use to throw exceptions. It is recommended that developers use these macros, as other methods may trap exceptions whose error messages are not as meaningful.

Exceptions thrown by extension code are caught by the server, which halts execution of the query containing an offending UDX and reports an error to the user.

  • azg_throw(extension, message, ...) – Formats the given arguments as a user visible message and throws this as an exception.
  • azg_check(expression, extension, message, ...) – Evaluates the given expression. If false, formats the given arguments as a user visible message and throws this as an exception.

Refer to the udx_api.hpp include file for a description of parameters for each of the listed functions.

Implementation Details

This section provides various internal AnzoGraph DB namespace and utility operations required for UDX creation, class templates, type conversions, data streaming, and so on.