User-Defined Function (UDF) Examples

A user-defined function (UDF) looks to the user just like a normal function that maps one row of input values to one row of output values. The following code sample comprises a complete, minimal, working example of a UDF extension:

struct AND : Function {
  void apply(const Row &a, Row &r) override {
    r.setBoolean(0, a.getBoolean(0) && a.getBoolean(1));
  }  
};
*
*  signature:
*
extern "C" ExtensionFactory *udx_AND() { return new FactoryFor<AND>(); }
*
*  meta-data: 
*  
{
   "name"        : "http://example/and",
   "signature"   : "udx_AND",
   "type"        : "function",
   "arguments"   : ["boolean", "boolean"],
   "results"     : "boolean",
   "variadic"    : false,
   "description" : "Returns the logical conjunction of two booleans"
}
*

This example also includes meta-data information in the same file as the AND function extension definition, to register the function in AnzoGraph DB. In addition to those attributes common to every extension, the following attributes are specified for user-defined function extensions.

Attribute Description
arguments An array of zero or more UDX data types that specifies the number and type of arguments required by any application of the UDF. Determines the shape of the input row passed as the "a" parameter of the apply method.
results An array of zero or more UDX data types that specifies the number and type of results that are returned by any application of the UDF. Determines the shape of the output row passed as the "r" parameter of the apply method.
variadic A boolean value which indicates, if true, that the final type listed in the arguments array may be repeated one or more times in an application of the UDF. This attribute is optional.

See Registering a UDX in an Extension Library for more information on registration information and different methods of registering functions in AnzoGraph DB. Refer to each function's argument or result data types using their AnzoGraph DB registry or UDX data type names. (See UDX Data Types.)

Implementing the Apply() Method

The previous example defines a function named "udx_AND". The apply() method specifies input and output row arguments "a" and "r", and then performs a boolean AND operation of two boolean values, returning a Boolean value true (1) if both row arguments are true.

The arguments to the function, "a", are passed as a row whose schema is specified by the "arguments" meta-data attribute. If the UDF is optionally marked as being ‘variadic : true’, then the "variadic" meta-data may also receive additional values with type "a.shape().last()" at the end of the given row. The output parameter, "r", specifies a row to assign the results to, passed as a row whose shape is specified by the "results" meta-data attribute.

The implementation of the function extension also uses the setBoolean and getBoolean routines to process the values of the input and output arguments and perform the operation of the function. The following table lists the get and set routines available for AnzoGraph DB extensions. Use the Read Cell and/or Read Cell with Default get routines for argument a. The Read Cell with Default routines are used to return a default value if the cell is not defined (empty). Use the Write Cell set routines for argument r:

Read Cell (arg a) Read Cell with Default (arg a) Write Cell (arg r)
defined(size_t) N/A undefined(size_t)
getBoolean(size_t) getBoolean(size_t, bool) setBoolean(size_t, bool)
getByte(size_t) getByte(size_t, uint8_t) setByte(size_t, byte)
getShort(size_t) getShort(size_t, short) setShort(size_t, short)
getInt(size_t) getInt(size_t, int) setInt(size_t, int)
getLong(size_t) getLong(size_t, long) setLong(size_t, long)
getFloat(size_t) getFloat(size_t, float) setFloat(size_t, float)
getDouble(size_t) getDouble(size_t, double) setDouble(size_t, double)
getDate(size_t) getDate(size_t, Date) setDate(size_t, Date)
getTime(size_t) getTime(size_t, Time) setTime(size_t, Time)
getDateTime(size_t) getDateTime(size_t, DateTime) setDateTime(size_t, DateTime)
getDuration(size_t) getDuration(size_t, Duration) setDuration(size_t, Duration)
getString(size_t) getString(size_t, String) setString(size_t, String)
getLString(size_t) getLString(size_t, LString) setLString(size_t, LString)
getUDT(size_t) getUDT(size_t, UDT) setUDT(size_t, UDT)
getURI(size_t) getURI(size_t, URI) setURI(size_t, URI)
getTag(size_t) getTag(size_t, String) N/A
getType(size_t) N/A N/A
N/A N/A clear()
getBlob(size_t) getBlob(size_t, Blob) setBlob(size_t, Blob)

Compilation and Execution of Function Extensions

When compiling a query that includes a function extension (UDF), the leader verifies that the number and type of the arguments passed to the UDF are consistent with its domain, as specified by the "arguments" and "variadic" attributes of the meta-data description. When executing a query, each slice creates its own distinct instance of the UDF for every occurrence in the query by invoking the create method of the associated ExtensionFactory. As each row of values streams through a slice, it is passed to the instance by calling its apply method, and the results are then passed on upward to the consumer of the stream. The factory, and the instances that it creates, are destroyed only when the query has eventually finished execution.

Additional UDF Examples

The following example file, div.cpp, defines a function named divide.

Open div.cpp in a separate window

#include "udx_api.hpp"		// For extensions API
 using namespace std;		// Everything in standard namespace
 using namespace udx2;		// Everything in udx namespace

struct divide : Function
{ 
  void apply(const Row& a,Row& r)
  {
    if (a.defined(0) && a.defined(1))
    if (auto d = a.getLong(1))
    { 
      r.setDouble(0,a.getLong(0) / d);
    }
  }
};
extern "C" ExtensionFactory* udx_divide() { return new FactoryFor<divide>(); }

The following example file, udf.cpp, below defines three UDFs: a concat function that concatenates two strings, a usd_to_eur function that converts USD values to EUR, and a sortstr function that sorts words into a single string.

Open udf.cpp in a separate window

#include "udx_api.hpp"                   // For extensions API
using namespace std;                     // Everything in std
using namespace udx2;                    // Everything in udx

// create a concat function that concatenates two string values

struct concat : Function
{
  void apply(const Row& a,Row& r)
  {
    r.setString(0, string(a.getString(0)) + string(a.getString(1)));
  }
};
extern "C" ExtensionFactory* udx_concat() { return new FactoryFor<concat>(); }

// create a usd_to_euro function that converts USD values to EUR

struct usd_to_euro : Function
{
  void apply(const Row& a,Row& r)
  {
    auto exchange_rate = 0.81;
    auto [udt,tag] = a.getUDT(0);
    if (tag=="$" && !udt.empty())
    {
    istringstream i{string(udt)};
    ostringstream o;o.precision(2);o.setf(ios::fixed);
    double      d;

    i >> d;
    o << d * exchange_rate ;

    r.setUDT(0,{o.str(),"€"});
    }
  }
};
extern "C" ExtensionFactory* udx_usd_to_euro() { return new FactoryFor<usd_to_euro>();}

// create a sort string function that sorts the words into a string

struct sortstr : Function
{
  void apply(const Row& a,Row& r)
  {
    ostringstream             o;
    istringstream             i(string(a.getString(0)));
    istream_iterator<string>  b(i);
    vector<string>            tokens(b,istream_iterator<string>());
    
    sort(begin(tokens),end(tokens));

    copy(begin(tokens),end(tokens),ostream_iterator<string>(o," "));

    r.setString(0,o.str().c_str());
  }
};
extern "C" ExtensionFactory* udx_sortstr() { return new FactoryFor<sortstr>(); }