Reviewing UDX Interface Files
This topic provides more detailed information on the content of UDX API interface files available for UDX development in C++.
Documentation on UDX development in JVM environments is in progress and not available at this time.
Reviewing the C++ UDX Include File
The following sections describe each of the sections within the udx_api.hpp
file. This file is located in the <install_path>/include
directory.
- Overview
- Data Types
- Extension Points
- System Information
- Memory Allocation
- Exceptions
- Implementation Details
- Reviewing the C++ UDX Include File
Overview
The beginning portion of the udx_api.hpp
file simply provides some versioning information and description of major changes among different versions of the include file. The top portion of the include file also provides additional directives to the C++ compiler regarding additional system library includes, a name space declaration of various function and variable scope, and the metadata structure that provides various version compatibility information.
Data Types
This section of the C++ include header (.hpp
) file provides several different things:
- Enumeration of available UDX data types, for example,
t_Null = 0
,t_boolean = 1
,t_byte = 2
, and so on. - Listing of various aliases mapped to corresponding UDX data types with the
using
keyword directive or declaration, for example,using blob = std::int64_t;
,using Date = std::int32_t
, and so on. - Declarations of a public
DateTime()
class along with data members and several different function call signatures to store and return various date, time, and timezone information. - Various public helper functions:
Duration()
– calculate months or time (in microseconds) since January 1, 2000.LString()
–UDT()
–Blob()
–
- An
Allocated
structure that developers can use to allocate space for objects with memory managed by AnzoGraph DB - A
Row
class that stores and returns information on UDX input arguments and result shapes or schema. AnzoGraph DB uses rows to marshal values in and out of user code. - A namespace that provides operations to insert data values and other information, for example, schemas, rows, datetime values, and blobs, into memory,
Extension Points
This section defines different extension point interfaces that developers can implement to create different types of extensions in an UDX library file:
- User Defined Function (UDF) looks to the user like a normal function that maps one row of input values to one row of output values.
- User Defined Agggregate (UDA) looks to the user like a normal function that maps many rows of input values to one row of output values.
- User Defined Service (UDS) looks to the user like a service that maps many rows of input values to many output values, or like a table that maps zero rows of input values to many output values.
An Extension
struct provides a common base from which all user-defined extensions are derived.
Extension Library Meta-Data
C++ extensions are compiled and linked into extension libraries, which are modules of executable code that export a meta-data description of their contents. Formally, a C++ extension library is any Linux shared library that exports an entry point of the form:
//* //* extern "C" void register_extensions(MetaData &md) { //* md.json_metadata = R"({ //* "name" : <name>, //* "language" : "c++", //* "version" : <version>, //* "description" : <description>, //* "author" : <author>, //* "copyright" : <copyright>, //* "contents" : [ <extension> * ] //* })"; //* } //*
The values of each attribute in the JSON metadata have the following meaning:
Field | Description |
---|---|
name
|
The name of the library; a string that distinguishes it from all other such libraries currently installed within the system. |
language
|
Hard-coded to "C++" for UDX libraries created with C++. |
version |
The version of the library; specified as a string of the form "MAJOR.MINOR.PATCH". |
description
|
A brief description of the library and the kind of functionality that it provides. This attribute is optional. |
author
|
The author of the library.This attribute is optional. |
copyright
|
Any copyright that may pertain to the library. This attribute is optional. |
contents [<extension> *]
|
Specifies a meta-data description of each extension defined with the same source file and exported to AnzoGraph DB from the library. |
Extension Meta-Data
The contents
attribute of the AnzoGraph DB extension library meta-data fields enumerates JSON meta-data descriptions of each extension defined within the same extension library source file. Each type of extension point share a certain number of attributes in common:
Field | Description |
---|---|
name | The name-space qualified name of the extension as it appears to a user of the query language. |
type | The type of the extension point implemented: "function", "aggregate", or "service". , or "table". (Use of the "table" type is deprecated, since the same functionality can be achieved using the service extension type.) |
signature | The name of the exported entry point that returns an "ExtensionFactory" for the given extension. |
description | A brief description of the extension itself. This attribute is optional. |
See Registering a UDX in an Extension Library for more information on specifying entries in a C++ source file for the extension library and all the UDXs defined within the same library source file. For examples on creating C++ UDXs of each extension type, see UDX Examples.
UDX Type Structures and Meta-Data
The remaining portions of the Extension Point section of the udx_api.hpp
include file provide template class structures for the construction of every supported UDX type: Function, Aggregate, Service, and Table (deprecated). Each of the extension type classes also provide methods for processing input arguments passed to extensions or values returned from those same extensions.
Class Structure | Description |
---|---|
struct Function : Extension
|
The Function class represents, perhaps, the simplest of the extension points, and allows developers to extend the set of analytic functions already built into the query language seen by the end user. A function extension applies some operation to the given arguments and returns an output row result based on the function's implementation. |
struct Aggregate : Extension
|
The Aggregate class enables developers to extend the set of aggregate functions that are already built into AnzoGraph DB for use in queries. Aggregate functions process a given row of values across ... |
struct Service : Extension
|
The Service class represents perhaps the most powerful of the extension points and ... |
struct Table : Extension
|
Deprecated. |
Each of the different extension type classes has additional attributes specified as part of UDX library meta-data.
Additional Attributes for the Function Extension
In addition to those attributes common to every extension, the following attributes are specified for extension functions.
Attribute | Description |
---|---|
arguments | An array of zero or more types that specifies the number and type of arguments required by any application of the UDF. Determines the shape of the input row passed as the "a " parameter of the apply method. |
results | An array of zero or more types that specifies the number and type of results that are returned by any application of the UDF. Determines the shape of the output row passed as the "r " parameter of the apply method. |
variadic | A boolean value which indicates, if true, that the final type listed in the arguments array may be repeated one or more times in an application of the UDF. This attribute is optional. |
When compiling a query including a UDF, for example, the leader verifies that the number and type of the arguments passed to the UDF are consistent with its domain, as specified by the arguments and variadic attributes of the meta-data description. When executing a query, each slice creates its own distinct instance of the UDF for every occurrence in the query by invoking the create
method of the associated ExtensionFactory
.
As each row of values streams through a slice, it is passed to the instance by calling its apply
method, and the results are then passed up to the consumer of the stream. The factory, and the instances that it creates, are destroyed only when the query has eventually finished executing.
Additional Attributes for the Aggregate Extension
In addition to those attributes common to every extension, the Aggregate extension has the following additional meta-data attributes.
Attribute | Description |
---|---|
arguments |
An array of zero or more types that specifies the number and type of arguments required by any application of the UDF. Determines the shape of the input row passed as the ' |
results | An array of zero or more types that specifies the number and type of results that are returned by any application of the UDF. Determines the shape of the output row passed as the 'r ' parameter of the result method. |
variadic |
A boolean value which, if true, indicates that the final type listed in the arguments array may be repeated one or more times in an application of the UDA. This attribute is optional. |
states |
An array of zero or more types that specifies the number and type of states that are marshaled across the cluster when merging those intermediate results that accumulated on the slices during the accumulation phase. Determines the shape of the output row passed as the " |
When compiling a query, the leader verifies that the number and type of the arguments passed to the UDA are consistent with its domain, as specified by the arguments and variadic attributes of the meta-data description. A slice of the cluster is designated as the receiver of the final aggregate result.
When executing a query, each slice creates its own distinct instance of the UDA for every occurrence in the query by invoking the create
method of the associated ExtensionFactory
.
- As each row of values streams through a slice, it is passed to the instance by calling its
accumulate
method, which responds by updating its internal state as necessary to record having processed the row in some appropriate way. - When all rows on the slice have been accumulated, the instance is now given a mutable row (of shape
states
) into which it serializes any intermediate results it has accumulated, and instance is then destroyed. - The slice receiving the result now creates an instance of the UDA and a row (of shape
states
), and the system arranges for all intermediate states to be transmitted across the cluster and 'merged' into the instance by passing each in turn to itsmerge
method.
The factory, and the instances that it creates, are destroyed only when the query has eventually finished executing.
System Information
AnzoGraph DB also provides a number of utility functions that can be invoked at any time by any UDX. These functions let you query basic system information useful in providing more exact control of a UDX's execution.
The following table provides a brief description of these functions.
Function | Description |
---|---|
|
Boolean; returns true if the caller is being invoked during the execution of a query on the nodes of the cluster. |
amExecuting() |
Returns the identifier of the worker node on which the caller is executing as an integer in the range 0 to getNodes() . |
getNode()
|
Returns the total number of worker nodes in the cluster. |
getNodes()
|
Returns the identifier of the slice on which the caller is executing; returns an integer in the range 0 to getSlices() . |
getSlice()
|
Returns the total number of slices in the cluster. |
getSlices()
|
Returns a string of the form "<MAJOR>.<MINOR>,<PATCH> " that describes the AnzoGraph API version supported by this
server.
|
getHostVersion()
|
Returns the source text of the currently executing query. |
getQueryText()
|
Returns any additional data that may have been supplied by the client along with the text of the query that is currently executing. The client context string that is returned may take any form whatsoever; the server does not parse it in any way, but merely makes it available to extensions unmodified via this access function. |
getClientContext()
|
Logs a null-terminated string and an associated integer to the sth_udx system table
. extension The string includes the namespace-qualified name of the extension that is
logging a given message, an integer indicating the severity level of the event, and the raw text of the message to be logged. The 'level' ostensibly describes the 'severity' of the given logging event,
but can in fact represent anything; the system does not interpret its value
in any particular way. |
logText(...)
|
Logs a null-terminated string and an associated integer to the sth_udx system table.
This variant of the log function formats any additional details into the
given message template printf style before forwarding the arguments on to
the logText function. |
vlog(...)
|
Provides another variant of the logging functions. |
log(...)
|
Refer to the udx_api.hpp
include file for addtional comments and descriptions of parameters for each of the listed functions.
Memory Allocation
Developers are encouraged to call the following functions to acquire memory for extensions from the system's free memory, thus allowing the server to monitor an extension's usage of memory, warn of possible leaks, and generally ensure the smooth running
of the system as a whole.
void *malloc(size_t);
void *calloc(size_t, size_t);
void *realloc(void *, size_t);
void free(void *);
When allocating memory for large objects, or allocating memory for objects that will outlive the current stack frame (for example, assigning to a data member of an extension), it is recommended you used one of the following options:
- Use a standard library container that is parameterized on the new udx2::allocator:
struct agg : Aggregate {
std::vector<int,udx2::allocator<int>> m_vec = { ... };
- Use a smart pointer:
struct agg : Aggregate {
udx2::unique_ptr<Object> = udx2::make_unique<Object>( ... );
- call udx2::mallo, etc:
Objects with indeterminate lifetime should use one these three techniques, if at all possible. For other situations, you can continue to use small, short-lived objects as before, for example:
bool foo(const std::string& x) { ... }
bool bar () { std::vector<int> = {... }; ... }
Exceptions
Exceptions thrown by extension code are caught by the server, which aborts the currently executing query and reports the error to the user. This section provides a collection of macros that developers can use to throw exceptions. It is recommended that developers use these macros, as other methods may trap exceptions whose error messages are not as meaningful.
Exceptions thrown by extension code are caught by the server, which halts execution of the query containing an offending UDX and reports an error to the user.
azg_throw(extension, message, ...)
– Formats the given arguments as a user visible message and throws this as an exception.azg_check(expression, extension, message, ...)
– Evaluates the given expression. If false, formats the given arguments as a user visible message and throws this as an exception.
Refer to the udx_api.hpp
include file for a description of parameters for each of the listed functions.
Implementation Details
This section provides various internal AnzoGraph DB namespace and utility operations required for UDX creation, class templates, type conversions, data streaming, and so on.