Apache Arrow Functions (Preview)

Apache Arrow is a software development platform for building high performance applications that process and transport large data sets. It is designed to improve both the performance of analytical algorithms and the efficiency of moving data from one system to another. One important feature of Apache Arrow is its in-memory columnar format, a standardized, language-agnostic specification for representing structured, table-like data sets in-memory.

AnzoGraph can act as an Apache Arrow client for graph query driven exports and imports of in-memory data sets. AnzoGraph provides a collection of user-defined service functions that support the Arrow Flight protocol for integration with leading ML and other Big Data Ecosystems including Python Pandas, Spark MLLIB and Google Tensorflow, Cassandra, Kudu, and Hadoop.

This topic provides details about two separate groups of AnzoGraph Arrow service functions that are available, Arrow Flight and Arrow Plasma. You can click on individual function names to view the syntax of each function signature and as well as obtain details about individual function arguments and return values.

Category/Function Description
Arrow Flight Functions:
arrow_flight_push_csv Pushes out a .CSV file from AnzoGraph to the flight server.
arrow_flight_push Pushes out the result of an AnzoGraph query directly to the flight server.
arrow_flight_list Lists all the available flights known to the flight server.
arrow_flight_get_info Returns metadata information for a specific flight.
arrow_flight_get Returns data of a specific flight from the flight server.
Arrow Plasma Functions:
arrow_plasma_size Determines the size of the buffer to request from the plasma-store.
arrow_plasma_write Writes the result of an AnzoGraph query directly into an in-memory plasma table.
arrow_plasma_list Lists all the objects in the plasma store.
arrow_plasma_contains Checks if an object exist in the plasma store.
arrow_plasma_schema Returns the table schema and metadata for an object in the plasma store.
arrow_plasma_read Reads the data from a table in the in-memory plasma store.
arrow_plasma_delete Deletes an object from the plasma store.

The following sections provided additional detail of each available Arrow integration service function as well as the syntax or signature of each function call.

Arrow Flight Functions

Click on individual function names in the table below to view the syntax of each Arrow Flight function signature, as well as obtain details about individual function arguments and return values.

Function Description
arrow_flight_push_csv Pushes out a .CSV file from AnzoGraph to the flight server.
arrow_flight_push Pushes out the result of an AnzoGraph query directly to the flight server.
arrow_flight_list Lists all the available flights known to the flight server.
arrow_flight_get_info Returns metadata information for a specific flight.
arrow_flight_get Returns data of a specific flight from the flight server.

arrow_flight_push_csv

This function pushes out a .CSV file from AnzoGraph to the flight server. The syntax for calling the function is the following:

arrow_flight_push_csv(IP : String, port : int, file : String, root_cert_file : String)

Function syntax arguments are the following:

Argument Data Type Description
IP String Server IP address.
port int Server port number.
file String .CSV file path.
root_cert_file String Path to a trusted TLS root certificate. For non-TLS operation, specify an empty path.

The function returns:

(num_rows : long, path : String)
Return Value Data Type Description
num_rows long Number of rows of the table pushed to the server.
path String Flight path.

Example Query:

PREFIX exfun: <http://cambridgesemantics.com/anzograph/inmemory#>

SELECT * WHERE
{
  SERVICE exfun:arrow_flight_push_csv("10.117.2.36", 5005, 
    "/tmp/arrow/data/iris.csv", "/tmp/arrow/keys/root-ca.pem") {}
}

Results from the query appear similar to the following example:

num_rows  | path
----------+--------------------------
150       | /tmp/arrow/data/iris.csv

arrow_flight_push

This function pushes out the result of an AnzoGraph query directly to the flight server. The syntax for calling the function is the following:

arrow_flight_push(IP : String, port : int, cmd_name : String,
  root_cert_file : String

Function syntax arguments are the following:

Argument Data Type Description
IP String Server IP address.
port int Server port number.
cmd_name String Flight name that can be a SPARQL command or custom name. Specify "CMD" to use SPARQL command as name, otherwise provide a custom name.
root_cert_file String Path to a trusted TLS root certificate. For non-TLS operation, specify an empty path.

The function returns:

(num_rows : long, flight_name : String)
Return Value Data Type Description
num_rows long Number of rows pushed to the server.
flight_name String Name of the flight.

Query Example 1:

Returns flight name as user defined name, for example, "tickit_total_price".

PREFIX exfun: <http://cambridgesemantics.com/anzograph/inmemory#>
SELECT ?num_rows ?flight_name
FROM <tickit> WHERE
{
  {
    SELECT ?ticketPrice ?numtickets ?totalPrice
    WHERE
    {
      ?sell <priceperticket>   ?ticketPrice;
      <numtickets>       ?numtickets;
      <totalprice>       ?totalPrice .  
    }
  }
  SERVICE exfun:arrow_flight_push("10.117.2.36", 5005, 
    "tickit_total_price", "/tmp/arrow/keys/root-ca.pem") { }
}

Results from the query appear similar to the following:

num_rows  | flight_name
----------+--------------------
192497    | tickit_total_price

Query Example 2:

Returns flight name as SPARQL command.

PREFIX exfun: <http://cambridgesemantics.com/anzograph/inmemory#>
SELECT ?num_rows ?flight_name
FROM <tickit> WHERE
{
  {
    SELECT ?ticketPrice ?numtickets ?totalPrice
    WHERE
    {
      ?sell <priceperticket>   ?ticketPrice;
      <numtickets>       ?numtickets;
      <totalprice>       ?totalPrice .
    } 
  }
  SERVICE exfun:arrow_flight_push("10.117.2.36", 5005, "CMD", 
    "/tmp/arrow/keys/root-ca.pem") { }
}

Results from the query appear similar to the following:

num_rows | flight_name
---------+--------------------
192497   | SELECT ?ticketPrice ?numtickets ?totalPrice WHERE { ?sell <priceperticket> ?ticketPrice;
            <numtickets> ?numtickets; <totalprice> ?totalPrice . }

arrow_flight_list

This function lists all the available flights known to the flight server. The syntax for calling the function is the following:

arrow_flight_list(IP : String, port : int, root_cert_file : String )

Function arguments are the following:

Argument Data Type Description
IP String Server IP address.
port int Server port number.
root_cert_file String Path to a trusted TLS root certificate. For non-TLS operation, specify an empty path.

The function returns:

(flight : long, total_endpoints : long, endpoint_num : int, 
   flight_type : String, flight_name : String, location : String)
Return Value Data Type Description
flight long Flight number..
total_endpoints long Total number of endpoints associated with the flight (dataset).
endpoint_num int Endpoint number in the flight.

flight_type

String Type of flight. CMD-Command, PATH-Path.
flight_name String Name of the flight.
location String List of locations where ticket can be redeemed.

Query Example:

PREFIX exfun: <http://cambridgesemantics.com/anzograph/inmemory#>
SELECT * WHERE
{
   SERVICE exfun:arrow_flight_list("10.117.2.36", 5005, "/tmp/arrow/keys/root-ca.pem") {}
}

Results from the query appear similar to the following:

flight  | total_endpoints | endpoint_num | flight_type | flight_name                                                                                                                                        | location
--------+-----------------+--------------+-------------+----------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------
1       |       1         |     1        |   PATH      |  /tmp/arrow/data/iris.csv                                                                                                                          | grpc+tls://10.117.2.36:5005
2       |       1         |     1        |   CMD       |  tickit_total_price                                                                                                                                | grpc+tls://10.117.2.36:5005
3       |       1         |     1        |   CMD       |  SELECT ?ticketPrice ?numtickets ?totalPrice WHERE { ?sell <priceperticket> ?ticketPrice; <numtickets> ?numtickets; <totalprice> ?totalPrice . }   | grpc+tls://10.117.2.36:5005

arrow_flight_get_info

This function returns metadata information for a specific flight. The syntax for calling the function is the following:

arrow_flight_get_info(IP : String, port : int, type : String,
  path : String, root_cert_file : String

Function syntax arguments are the following:

Argument Data Type Description
IP String Server IP address.
port int Server port number.
type String

Flight descriptor's type: 1-PATH, 2-CMD .

path String Flight command or path.
root_cert_file String Path to a trusted TLS root certificate. For non-TLS operation, specify an empty path.

The function returns the following:

(schema : long, total_records : String, total_bytes : String)
Return Data Type Description
schema String Flight's UDX schema.
total_records long Total number of records (rows) in the dataset.
total_bytes long Total number of bytes in the dataset.

Query Example 1:

Returns query name flight’s metadata.

PREFIX exfun: <http://cambridgesemantics.com/anzograph/inmemory#>
SELECT (row_opt_string(?flight_info,0) as ?schema) (row_get_long(?flight_info,1) as ?records) (row_get_long(?flight_info,2) as ?bytes)
WHERE
{
   SELECT (exfun:arrow_flight_get_info("10.117.2.36", 5005, 2, "SELECT ?ticketPrice 
     ?numtickets ?totalPrice WHERE { ?sell <priceperticket> ?ticketPrice; 
     <numtickets> ?numtickets; <totalprice> ?totalPrice . }", 
     "/tmp/arrow/keys/root-ca.pem") as ?flight_info)
}

Results from the query appear similar to the following:

schema                                                 | records | bytes
-------------------------------------------------------+---------+---------
?ticketPrice:double ?numtickets:int ?totalPrice:double |  192497 | 3942600

Query Example 2:

Returns path name flight’s metadata.

PREFIX exfun: <http://cambridgesemantics.com/anzograph/inmemory#>
SELECT (row_opt_string(?flight_info,0) as ?schema) (row_get_long(?flight_info,1) as ?records) (row_get_long(?flight_info,2) as ?bytes)
WHERE
{
   SELECT (exfun:arrow_flight_get_info("10.117.2.36", 5005, 1, "/tmp/arrow/data/iris.csv", "/tmp/arrow/keys/root-ca.pem") as ?flight_info)
}

Results from the query appear similar to the following:

schema                                                                                             | records | bytes
---------------------------------------------------------------------------------------------------+---------+-------
?sepal_length:double ?sepal_width:double ?petal_length:double ?petal_width:double ?variety:String  |   150   |  7368

Query Example 3:

Returns custom name flight’s metadata.

PREFIX exfun: <http://cambridgesemantics.com/anzograph/inmemory#>
SELECT (row_opt_string(?flight_info,0) as ?schema) (row_get_long(?flight_info,1) as ?records) (row_get_long(?flight_info,2) as ?bytes)
WHERE
{
   SELECT (exfun:arrow_flight_get_info("10.117.2.36", 5005, 2, "tickit_total_price", "/tmp/arrow/keys/root-ca.pem") as ?flight_info)
}

Results from the query appear similar to the following:

schema                                                  | records   | bytes
--------------------------------------------------------+-----------+---------
?ticketPrice:double ?numtickets:int ?totalPrice:double  |  192497  | 3942600

arrow_flight_get

This function returns data of a specific flight from the flight server. The syntax for calling the function is the following:

arrow_flight_get(IP : String, port : int, flight_type : int, 
flight_name : String, root_cert_file : String)

The function returns the column data for the specified flight.

Function arguments are the following:

Argument Data Type Description
IP String Server IP address.
port int Server port number.
flight_type int Flight descriptor's type: 1-PATH, 2-CMD .
flight_name String Name of the flight.
root_cert_file String Path to a trusted TLS root certificate. For non-TLS operation, specify an empty path.

Query Example 1:

Returns CMD flight using custom name.

PREFIX exfun: <http://cambridgesemantics.com/anzograph/inmemory#>
SELECT * WHERE
{
   SERVICE exfun:arrow_flight_get("10.117.2.36", 5005, 2, "tickit_total_price", "/tmp/arrow/keys/root-ca.pem") { }
} ORDER By ?ticketPrice ?numtickets LIMIT 3

Results from the query appear similar to the following:

ticketPrice | numtickets | totalPrice
------------+------------+------------
20.000000   |     1      |  20.000000
20.000000   |     1      |  20.000000
20.000000   |     1      |  20.000000

Query Example 2:

Returns PATH flight.

PREFIX exfun: <http://cambridgesemantics.com/anzograph/inmemory#>
SELECT * WHERE{
   SERVICE exfun:arrow_flight_get("10.117.2.36", 5005, 1, "/tmp/arrow/data/iris.csv",  "/tmp/arrow/keys/root-ca.pem") { }
} ORDER By ?sepal_length ?sepal_width ?petal_length ?petal_width LIMIT 3

Results from the query appear similar to the following:

sepal_length | sepal_width | petal_length  | petal_width | variety
-------------+-------------+---------------+-------------+--------
4.300000     |    3.000000 |     1.100000  |    0.100000 | Setosa
4.400000     |    2.900000 |     1.400000  |    0.200000 | Setosa
4.400000     |    3.000000 |     1.300000  |    0.200000 | Setosa

Query Example 3:

Returns CMD flight using query name.

PREFIX exfun: <http://cambridgesemantics.com/anzograph/inmemory#>
SELECT * WHERE
{
   SERVICE exfun:arrow_flight_get("10.117.2.36", 5005, 2, "SELECT ?ticketPrice ?numtickets ?totalPrice WHERE { ?sell <priceperticket> ?ticketPrice; <numtickets> ?numtickets; <totalprice> ?totalPrice . }", "/tmp/arrow/keys/root-ca.pem") { }
} ORDER By ?ticketPrice ?numtickets LIMIT 3

Results from the query appear similar to the following:

ticketPrice  | numtickets | totalPrice
-------------+------------+------------
20.000000    |      1     |  20.000000
20.000000    |      1     |  20.000000
20.000000    |      1     |  20.000000

Query Example 4:

Create graph from CMD flight having custom name.

PREFIX exfun: <http://cambridgesemantics.com/anzograph/inmemory#>
INSERT
{
   graph <flight_graph1>
   {
      ?insUri <ticketPrice> ?ticketPrice;
      <numtickets> ?numtickets;
      <totalPrice> ?totalPrice.
   }
}
WHERE
{
   SERVICE exfun:arrow_flight_get("10.117.2.36", 5005, 2, "tickit_total_price", "/tmp/arrow/keys/root-ca.pem") { 
}
   BIND(IRI(CONCAT("http:/CSI.COM/", STRUUID())) as ?insUri)
}

To display the graph, you can run the following query:

SELECT ?ticketPrice ?numtickets ?totalPrice
FROM <flight_graph1> WHERE
{
   ?insUri <ticketPrice> ?ticketPrice;
   <numtickets> ?numtickets;
   <totalPrice> ?totalPrice.
} ORDER By ?ticketPrice ?numtickets LIMIT 3

Results from the query appear similar to the following:

ticketPrice | numtickets | totalPrice
------------+------------+------------
20.000000   |    1       |  20.000000
20.000000   |    1       |  20.000000
20.000000   |    1       |  20.000000

Query Example 5:

Create Graph from PATH flight.

PREFIX exfun: <http://cambridgesemantics.com/anzograph/inmemory#>
INSERT
{
   graph <flight_graph2>
   {
      ?insUri <sepal_length> ?sepal_length;
      <sepal_width>  ?sepal_width;
      <petal_length> ?petal_length;
      <petal_width>  ?petal_width;
      <variety>      ?variety.
   }
}
WHERE
{
   SERVICE exfun:arrow_flight_get("10.117.2.36", 5005, 1, "/tmp/arrow/data/iris.csv", "/tmp/arrow/keys/root-ca.pem") { }
   BIND(IRI(CONCAT("http:/CSI.COM/", STRUUID())) as ?insUri)
}

Query Example 6:

Create Graph from CMD flight having query name.

PREFIX exfun: <http://cambridgesemantics.com/anzograph/inmemory#>
INSERT
{
   graph <flight_graph3>
   {
      ?insUri <ticketPrice> ?ticketPrice;
      <numtickets> ?numtickets;
      <totalPrice> ?totalPrice.
   }
}
WHERE
{
   SERVICE exfun:arrow_flight_get("10.117.2.36", 5005, 2, "SELECT ?ticketPrice ?numtickets ?totalPrice WHERE { ?sell <priceperticket> ?ticketPrice; <numtickets> ?numtickets; <totalprice> ?totalPrice . }", "/tmp/arrow/keys/root-ca.pem") { }
   BIND(IRI(CONCAT("http:/CSI.COM/", STRUUID())) as ?insUri)
}

Arrow Plasma Functions

Click on individual function names in the table below to view the syntax of each Arrow Plasma function signature, as well as obtain details about individual function arguments and return values.

Function Description
arrow_plasma_size Determines the size of the buffer to request from the plasma-store.
arrow_plasma_write Writes the result of an AnzoGraph query directly into an in-memory plasma table.
arrow_plasma_list Lists all the objects in the plasma store.
arrow_plasma_contains Checks if an object exist in the plasma store.
arrow_plasma_schema Returns the table schema and metadata for an object in the plasma store.
arrow_plasma_read Reads the data from a table in the in-memory plasma store.
arrow_plasma_delete Deletes an object from the plasma store.

arrow_plasma_size

This function determines the size of the buffer to request from the plasma-store. The syntax for calling the function is the following:

arrow_plasma_size()

The function returns:

(plasma_size : long)

This value is the size required to store the object in plasma-store.

Query Example:

PREFIX exfun: <http://cambridgesemantics.com/anzograph/inmemory#>
SELECT ?plasma_size
FROM <tickit> WHERE
{
  {
     SELECT ?ticketPrice ?numtickets ?totalPrice
     WHERE
     {
       ?sell <priceperticket>   ?ticketPrice;
       <numtickets>       ?numtickets;
       <totalprice>       ?totalPrice .
     }
  }
  SERVICE exfun:arrow_plasma_size() {}
}

Results from the query appear similar to the following:

plasma_size
----------
4037824

arrow_plasma_write

This function writes the result of an AnzoGraph query directly into an in-memory plasma table. The syntax for calling the function is the following:

arrow_plasma_write(plasma_path : String, object_id : String) 

Function arguments are the following:

Argument Data Type Description
plasma_path String Name of the UNIX domain socket to use to connect to the Plasma store.
object_id String Unique ID (a byte string of length 20) of the object to create into the plasma store as arrow table structure.

The function returns:

(num_rows : long, object_id : String)
Return Data Type Description
num_rows long Number of rows stored for a table into the plasma store.
object_id String Unique ID (a byte string of length 20) of the object to create into the plasma store as arrow table structure.

Query Example:

PREFIX exfun: <http://cambridgesemantics.com/anzograph/inmemory#>
SELECT ?num_rows ?object_id
FROM <tickit> WHERE
{
  {
     SELECT ?plasma_size
     WHERE
     {
       {
         SELECT ?ticketPrice ?numtickets ?totalPrice
         WHERE
         {
            ?sell <priceperticket>   ?ticketPrice;
            <numtickets>       ?numtickets;
            <totalprice>       ?totalPrice .
         }
       }
       SERVICE exfun:arrow_plasma_size() {}
     }
   }
   {
     SELECT ?ticketPrice ?numtickets ?totalPrice
     WHERE
     {
       ?sell <priceperticket>   ?ticketPrice;
       <numtickets>       ?numtickets;
       <totalprice>       ?totalPrice .
     }
   }
   SERVICE exfun:arrow_plasma_write(
     "/tmp/plasma",                    #plasma store path
     "0000000000000000001a"            #ObjectID to write) {}
}

Results from the query appear similar to the following:

num_rows  | object_id
----------+---------------------------
192497    | 0000000000000000001a

arrow_plasma_list

This function lists all the objects in the plasma store. The syntax for calling the function is the following:

arrow_plasma_list(plasma_path : String)

Function syntax arguments are the following:

Argument Data Type Description
plasma_path String Name of the UNIX domain socket to use to connect to the Plasma store.

The function returns :

(object_id : String, metadata_size : long, 
data_size : long, ref_count : int, create_time : long, 
construct_duration : long, state : String)
Return Data Type Description
object_id String Unique ID of the object.
metadata_size long Size of the object metadata in bytes.
data_size long Size of the object data in bytes.
ref_count int Number of clients referencing the object buffer.
create_time long Unix timestamp of the object creation.
construct_duration long Object creation time in seconds.
state String Indicates whether the object still being created or it is already sealed.

Query Example:

PREFIX exfun: <http://cambridgesemantics.com/anzograph/inmemory#>
SELECT * WHERE
{
   SERVICE exfun:arrow_plasma_list("/tmp/plasma") {}
}

Results from the query appear similar to the following:

object_id            | metadata_size | data_size | ref_count | create_time  | construct_duration | state
---------------------+---------------+-----------+-----------+--------------+--------------------+--------
0000000000000000001a |      0        |   4037824 |     0     |  1585044544  |         4          | SEALED

arrow_plasma_contains

This function checks if an object exist in the plasma store. The syntax for calling the function is the following:

arrow_plasma_contains(path : String, object_id : String)

Function arguments are the following:

Argument Data Type Description
path String Name of the UNIX domain socket to use to connect to the Plasma store.
object_id String Unique ID of the object.

The function returns:

(r : Boolean)

The value returned is True if the plasma store contains a particular object and the object has been sealed.

Query Example:

PREFIX exfun: <http://cambridgesemantics.com/anzograph/inmemory#>
SELECT (exfun:arrow_plasma_contains("/tmp/plasma", "0000000000000000001a") as ?is_exist)

Results from the query appear similar to the following:

is_exist
-------
t

arrow_plasma_schema

This function returns the table schema and metadata for an object in the plasma store. The syntax for calling the function is the following:

arrow_plasma_schema(path : String, object_id : String)

Function arguments are the following:

Argument Data Type Description
path String Name of the UNIX domain socket to use to connect to the Plasma store.
object_id String Unique ID of the object..

The function returns:

(s : String, m : String)
Return Data Type Description
s String Table's schema .
m String Table's metadata.

Query Example:

PREFIX exfun: <http://cambridgesemantics.com/anzograph/inmemory#>
SELECT (row_opt_string(?plasma_schema,2,0) as ?schema) (row_opt_string(?plasma_schema,2,1) as ?metadata)
WHERE
{
   SELECT (exfun:arrow_plasma_schema("/tmp/plasma", "0000000000000000001a") as ?plasma_schema)
}

Results from the query appear similar to the following:

schema                                                 | metadata
-------------------------------------------------------+----------
?ticketPrice:double ?numtickets:int ?totalPrice:double |

arrow_plasma_read

This function reads the data from a table in the in-memory plasma store. The syntax for calling the function is the following:

arrow_plasma_read(plasma_path : String, object_id : String, col_name_type : String)

Function syntax arguments are the following:

Argument Data Type Description
plasma_path String Name of the UNIX domain socket to use to connect to the Plasma store.
object_id String Unique ID(a byte string of length 20) of the object to read from the plasma store as a arrow table structure.
col_name_type String Columns name and type of the table to read; string format is like "name1:type1 name2:type2".

Query Example:

PREFIX exfun: <http://cambridgesemantics.com/anzograph/inmemory#>
SELECT * WHERE
{
   SERVICE exfun:arrow_plasma_read
   (
      "/tmp/plasma",                                             #plasma store path
      "0000000000000000001a",                                    #objectID to read
      "?ticketPrice:double ?numtickets:int ?totalPrice:double"   #column names and type
    ) {}
} ORDER By ?ticketPrice ?numtickets LIMIT 5

Results from the query appear similar to the following:

ticketPrice  | numtickets  | totalPrice
-------------+-------------+------------
20.000000    |     1       |  20.000000
20.000000    |     1       |  20.000000
20.000000    |     1       |  20.000000
20.000000    |     1       |  20.000000
20.000000    |     1       |  20.000000

arrow_plasma_delete

This function deletes an object from the plasma store. The syntax for calling the function is the following:

arrow_plasma_delete(plasma_path : String, object_id : String)

Function arguments are the following:

Argument Data Type Description
plasma_path String Name of the UNIX domain socket to use to connect to the Plasma store.
object_id String Unique ID of the object.

The function returns :

(r : Boolean)

Query Example:

PREFIX exfun: <http://cambridgesemantics.com/anzograph/inmemory#>
SELECT (exfun:arrow_plasma_delete("/tmp/plasma", "0000000000000000001a") as ?is_deleted)

Results from the query appear similar to the following:

is_deleted
-------
t