Apache Arrow Functions (Preview)
Apache Arrow is a software development platform for building high performance applications that process and transport large data sets. It is designed to improve both the performance of analytical algorithms and the efficiency of moving data from one system to another. One important feature of Apache Arrow is its in-memory columnar format, a standardized, language-agnostic specification for representing structured, table-like data sets in-memory.
AnzoGraph can act as an Apache Arrow client for graph query driven exports and imports of in-memory data sets. AnzoGraph provides a collection of user-defined service functions that support the Arrow Flight protocol for integration with leading ML and other Big Data Ecosystems including Python Pandas, Spark MLLIB and Google Tensorflow, Cassandra, Kudu, and Hadoop.
This topic provides details about two separate groups of AnzoGraph Arrow service functions that are available, Arrow Flight and Arrow Plasma. You can click on individual function names to view the syntax of each function signature and as well as obtain details about individual function arguments and return values.
Category/Function | Description |
---|---|
Arrow Flight Functions: | |
arrow_flight_push_csv | Pushes out a .CSV file from AnzoGraph to the flight server. |
arrow_flight_push | Pushes out the result of an AnzoGraph query directly to the flight server. |
arrow_flight_list | Lists all the available flights known to the flight server. |
arrow_flight_get_info | Returns metadata information for a specific flight. |
arrow_flight_get | Returns data of a specific flight from the flight server. |
Arrow Plasma Functions: | |
arrow_plasma_size | Determines the size of the buffer to request from the plasma-store. |
arrow_plasma_write | Writes the result of an AnzoGraph query directly into an in-memory plasma table. |
arrow_plasma_list | Lists all the objects in the plasma store. |
arrow_plasma_contains | Checks if an object exist in the plasma store. |
arrow_plasma_schema | Returns the table schema and metadata for an object in the plasma store. |
arrow_plasma_read | Reads the data from a table in the in-memory plasma store. |
arrow_plasma_delete | Deletes an object from the plasma store. |
The following sections provided additional detail of each available Arrow integration service function as well as the syntax or signature of each function call.
Arrow Flight Functions
Click on individual function names in the table below to view the syntax of each Arrow Flight function signature, as well as obtain details about individual function arguments and return values.
Function | Description |
---|---|
arrow_flight_push_csv | Pushes out a .CSV file from AnzoGraph to the flight server. |
arrow_flight_push | Pushes out the result of an AnzoGraph query directly to the flight server. |
arrow_flight_list | Lists all the available flights known to the flight server. |
arrow_flight_get_info | Returns metadata information for a specific flight. |
arrow_flight_get | Returns data of a specific flight from the flight server. |
arrow_flight_push_csv
This function pushes out a .CSV file from AnzoGraph to the flight server. The syntax for calling the function is the following:
arrow_flight_push_csv(IP : String, port : int, file : String, root_cert_file : String)
Function syntax arguments are the following:
Argument | Data Type | Description |
---|---|---|
IP | String
|
Server IP address. |
port | int
|
Server port number. |
file | String
|
.CSV file path. |
root_cert_file | String
|
Path to a trusted TLS root certificate. For non-TLS operation, specify an empty path. |
The function returns:
(num_rows : long, path : String)
Return Value | Data Type | Description |
---|---|---|
num_rows | long
|
Number of rows of the table pushed to the server. |
path | String
|
Flight path. |
Example Query:
PREFIX exfun: <http://cambridgesemantics.com/anzograph/inmemory#> SELECT * WHERE { SERVICE exfun:arrow_flight_push_csv("10.117.2.36", 5005, "/tmp/arrow/data/iris.csv", "/tmp/arrow/keys/root-ca.pem") {} }
Results from the query appear similar to the following example:
num_rows | path ----------+-------------------------- 150 | /tmp/arrow/data/iris.csv
arrow_flight_push
This function pushes out the result of an AnzoGraph query directly to the flight server. The syntax for calling the function is the following:
arrow_flight_push(IP : String, port : int, cmd_name : String, root_cert_file : String
Function syntax arguments are the following:
Argument | Data Type | Description |
---|---|---|
IP | String
|
Server IP address. |
port | int
|
Server port number. |
cmd_name | String
|
Flight name that can be a SPARQL command or custom name. Specify "CMD" to use SPARQL command as name, otherwise provide a custom name. |
root_cert_file | String
|
Path to a trusted TLS root certificate. For non-TLS operation, specify an empty path. |
The function returns:
(num_rows : long, flight_name : String)
Return Value | Data Type | Description |
---|---|---|
num_rows | long
|
Number of rows pushed to the server. |
flight_name | String
|
Name of the flight. |
Query Example 1:
Returns flight name as user defined name, for example, "tickit_total_price".
PREFIX exfun: <http://cambridgesemantics.com/anzograph/inmemory#> SELECT ?num_rows ?flight_name FROM <tickit> WHERE { { SELECT ?ticketPrice ?numtickets ?totalPrice WHERE { ?sell <priceperticket> ?ticketPrice; <numtickets> ?numtickets; <totalprice> ?totalPrice . } } SERVICE exfun:arrow_flight_push("10.117.2.36", 5005, "tickit_total_price", "/tmp/arrow/keys/root-ca.pem") { } }
Results from the query appear similar to the following:
num_rows | flight_name ----------+-------------------- 192497 | tickit_total_price
Query Example 2:
Returns flight name as SPARQL command.
PREFIX exfun: <http://cambridgesemantics.com/anzograph/inmemory#> SELECT ?num_rows ?flight_name FROM <tickit> WHERE { { SELECT ?ticketPrice ?numtickets ?totalPrice WHERE { ?sell <priceperticket> ?ticketPrice; <numtickets> ?numtickets; <totalprice> ?totalPrice . } } SERVICE exfun:arrow_flight_push("10.117.2.36", 5005, "CMD", "/tmp/arrow/keys/root-ca.pem") { } }
Results from the query appear similar to the following:
num_rows | flight_name ---------+-------------------- 192497 | SELECT ?ticketPrice ?numtickets ?totalPrice WHERE { ?sell <priceperticket> ?ticketPrice; <numtickets> ?numtickets; <totalprice> ?totalPrice . }
arrow_flight_list
This function lists all the available flights known to the flight server. The syntax for calling the function is the following:
arrow_flight_list(IP : String, port : int, root_cert_file : String )
Function arguments are the following:
Argument | Data Type | Description |
---|---|---|
IP | String
|
Server IP address. |
port | int
|
Server port number. |
root_cert_file | String
|
Path to a trusted TLS root certificate. For non-TLS operation, specify an empty path. |
The function returns:
(flight : long, total_endpoints : long, endpoint_num : int, flight_type : String, flight_name : String, location : String)
Return Value | Data Type | Description |
---|---|---|
flight | long
|
Flight number.. |
total_endpoints | long
|
Total number of endpoints associated with the flight (dataset). |
endpoint_num | int
|
Endpoint number in the flight. |
flight_type |
String
|
Type of flight. CMD-Command, PATH-Path. |
flight_name | String
|
Name of the flight. |
location | String
|
List of locations where ticket can be redeemed. |
Query Example:
PREFIX exfun: <http://cambridgesemantics.com/anzograph/inmemory#> SELECT * WHERE { SERVICE exfun:arrow_flight_list("10.117.2.36", 5005, "/tmp/arrow/keys/root-ca.pem") {} }
Results from the query appear similar to the following:
flight | total_endpoints | endpoint_num | flight_type | flight_name | location --------+-----------------+--------------+-------------+----------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------- 1 | 1 | 1 | PATH | /tmp/arrow/data/iris.csv | grpc+tls://10.117.2.36:5005 2 | 1 | 1 | CMD | tickit_total_price | grpc+tls://10.117.2.36:5005 3 | 1 | 1 | CMD | SELECT ?ticketPrice ?numtickets ?totalPrice WHERE { ?sell <priceperticket> ?ticketPrice; <numtickets> ?numtickets; <totalprice> ?totalPrice . } | grpc+tls://10.117.2.36:5005
arrow_flight_get_info
This function returns metadata information for a specific flight. The syntax for calling the function is the following:
arrow_flight_get_info(IP : String, port : int, type : String, path : String, root_cert_file : String
Function syntax arguments are the following:
Argument | Data Type | Description |
---|---|---|
IP | String
|
Server IP address. |
port | int
|
Server port number. |
type | String
|
Flight descriptor's type: 1-PATH, 2-CMD . |
path | String
|
Flight command or path. |
root_cert_file | String
|
Path to a trusted TLS root certificate. For non-TLS operation, specify an empty path. |
The function returns the following:
(schema : long, total_records : String, total_bytes : String)
Return | Data Type | Description |
---|---|---|
schema | String
|
Flight's UDX schema. |
total_records | long
|
Total number of records (rows) in the dataset. |
total_bytes | long
|
Total number of bytes in the dataset. |
Query Example 1:
Returns query name flight’s metadata.
PREFIX exfun: <http://cambridgesemantics.com/anzograph/inmemory#> SELECT (row_opt_string(?flight_info,0) as ?schema) (row_get_long(?flight_info,1) as ?records) (row_get_long(?flight_info,2) as ?bytes) WHERE { SELECT (exfun:arrow_flight_get_info("10.117.2.36", 5005, 2, "SELECT ?ticketPrice ?numtickets ?totalPrice WHERE { ?sell <priceperticket> ?ticketPrice; <numtickets> ?numtickets; <totalprice> ?totalPrice . }", "/tmp/arrow/keys/root-ca.pem") as ?flight_info) }
Results from the query appear similar to the following:
schema | records | bytes -------------------------------------------------------+---------+--------- ?ticketPrice:double ?numtickets:int ?totalPrice:double | 192497 | 3942600
Query Example 2:
Returns path name flight’s metadata.
PREFIX exfun: <http://cambridgesemantics.com/anzograph/inmemory#> SELECT (row_opt_string(?flight_info,0) as ?schema) (row_get_long(?flight_info,1) as ?records) (row_get_long(?flight_info,2) as ?bytes) WHERE { SELECT (exfun:arrow_flight_get_info("10.117.2.36", 5005, 1, "/tmp/arrow/data/iris.csv", "/tmp/arrow/keys/root-ca.pem") as ?flight_info) }
Results from the query appear similar to the following:
schema | records | bytes ---------------------------------------------------------------------------------------------------+---------+------- ?sepal_length:double ?sepal_width:double ?petal_length:double ?petal_width:double ?variety:String | 150 | 7368
Query Example 3:
Returns custom name flight’s metadata.
PREFIX exfun: <http://cambridgesemantics.com/anzograph/inmemory#> SELECT (row_opt_string(?flight_info,0) as ?schema) (row_get_long(?flight_info,1) as ?records) (row_get_long(?flight_info,2) as ?bytes) WHERE { SELECT (exfun:arrow_flight_get_info("10.117.2.36", 5005, 2, "tickit_total_price", "/tmp/arrow/keys/root-ca.pem") as ?flight_info) }
Results from the query appear similar to the following:
schema | records | bytes --------------------------------------------------------+-----------+--------- ?ticketPrice:double ?numtickets:int ?totalPrice:double | 192497 | 3942600
arrow_flight_get
This function returns data of a specific flight from the flight server. The syntax for calling the function is the following:
arrow_flight_get(IP : String, port : int, flight_type : int, flight_name : String, root_cert_file : String)
The function returns the column data for the specified flight.
Function arguments are the following:
Argument | Data Type | Description |
---|---|---|
IP | String
|
Server IP address. |
port | int
|
Server port number. |
flight_type | int
|
Flight descriptor's type: 1-PATH, 2-CMD . |
flight_name | String
|
Name of the flight. |
root_cert_file | String
|
Path to a trusted TLS root certificate. For non-TLS operation, specify an empty path. |
Query Example 1:
Returns CMD flight using custom name.
PREFIX exfun: <http://cambridgesemantics.com/anzograph/inmemory#> SELECT * WHERE { SERVICE exfun:arrow_flight_get("10.117.2.36", 5005, 2, "tickit_total_price", "/tmp/arrow/keys/root-ca.pem") { } } ORDER By ?ticketPrice ?numtickets LIMIT 3
Results from the query appear similar to the following:
ticketPrice | numtickets | totalPrice ------------+------------+------------ 20.000000 | 1 | 20.000000 20.000000 | 1 | 20.000000 20.000000 | 1 | 20.000000
Query Example 2:
Returns PATH flight.
PREFIX exfun: <http://cambridgesemantics.com/anzograph/inmemory#> SELECT * WHERE{ SERVICE exfun:arrow_flight_get("10.117.2.36", 5005, 1, "/tmp/arrow/data/iris.csv", "/tmp/arrow/keys/root-ca.pem") { } } ORDER By ?sepal_length ?sepal_width ?petal_length ?petal_width LIMIT 3
Results from the query appear similar to the following:
sepal_length | sepal_width | petal_length | petal_width | variety -------------+-------------+---------------+-------------+-------- 4.300000 | 3.000000 | 1.100000 | 0.100000 | Setosa 4.400000 | 2.900000 | 1.400000 | 0.200000 | Setosa 4.400000 | 3.000000 | 1.300000 | 0.200000 | Setosa
Query Example 3:
Returns CMD flight using query name.
PREFIX exfun: <http://cambridgesemantics.com/anzograph/inmemory#> SELECT * WHERE { SERVICE exfun:arrow_flight_get("10.117.2.36", 5005, 2, "SELECT ?ticketPrice ?numtickets ?totalPrice WHERE { ?sell <priceperticket> ?ticketPrice; <numtickets> ?numtickets; <totalprice> ?totalPrice . }", "/tmp/arrow/keys/root-ca.pem") { } } ORDER By ?ticketPrice ?numtickets LIMIT 3
Results from the query appear similar to the following:
ticketPrice | numtickets | totalPrice -------------+------------+------------ 20.000000 | 1 | 20.000000 20.000000 | 1 | 20.000000 20.000000 | 1 | 20.000000
Query Example 4:
Create graph from CMD flight having custom name.
PREFIX exfun: <http://cambridgesemantics.com/anzograph/inmemory#> INSERT { graph <flight_graph1> { ?insUri <ticketPrice> ?ticketPrice; <numtickets> ?numtickets; <totalPrice> ?totalPrice. } } WHERE { SERVICE exfun:arrow_flight_get("10.117.2.36", 5005, 2, "tickit_total_price", "/tmp/arrow/keys/root-ca.pem") { } BIND(IRI(CONCAT("http:/CSI.COM/", STRUUID())) as ?insUri) }
To display the graph, you can run the following query:
SELECT ?ticketPrice ?numtickets ?totalPrice FROM <flight_graph1> WHERE { ?insUri <ticketPrice> ?ticketPrice; <numtickets> ?numtickets; <totalPrice> ?totalPrice. } ORDER By ?ticketPrice ?numtickets LIMIT 3
Results from the query appear similar to the following:
ticketPrice | numtickets | totalPrice ------------+------------+------------ 20.000000 | 1 | 20.000000 20.000000 | 1 | 20.000000 20.000000 | 1 | 20.000000
Query Example 5:
Create Graph from PATH flight.
PREFIX exfun: <http://cambridgesemantics.com/anzograph/inmemory#> INSERT { graph <flight_graph2> { ?insUri <sepal_length> ?sepal_length; <sepal_width> ?sepal_width; <petal_length> ?petal_length; <petal_width> ?petal_width; <variety> ?variety. } } WHERE { SERVICE exfun:arrow_flight_get("10.117.2.36", 5005, 1, "/tmp/arrow/data/iris.csv", "/tmp/arrow/keys/root-ca.pem") { } BIND(IRI(CONCAT("http:/CSI.COM/", STRUUID())) as ?insUri) }
Query Example 6:
Create Graph from CMD flight having query name.
PREFIX exfun: <http://cambridgesemantics.com/anzograph/inmemory#> INSERT { graph <flight_graph3> { ?insUri <ticketPrice> ?ticketPrice; <numtickets> ?numtickets; <totalPrice> ?totalPrice. } } WHERE { SERVICE exfun:arrow_flight_get("10.117.2.36", 5005, 2, "SELECT ?ticketPrice ?numtickets ?totalPrice WHERE { ?sell <priceperticket> ?ticketPrice; <numtickets> ?numtickets; <totalprice> ?totalPrice . }", "/tmp/arrow/keys/root-ca.pem") { } BIND(IRI(CONCAT("http:/CSI.COM/", STRUUID())) as ?insUri) }
Arrow Plasma Functions
Click on individual function names in the table below to view the syntax of each Arrow Plasma function signature, as well as obtain details about individual function arguments and return values.
Function | Description |
---|---|
arrow_plasma_size | Determines the size of the buffer to request from the plasma-store. |
arrow_plasma_write | Writes the result of an AnzoGraph query directly into an in-memory plasma table. |
arrow_plasma_list | Lists all the objects in the plasma store. |
arrow_plasma_contains | Checks if an object exist in the plasma store. |
arrow_plasma_schema | Returns the table schema and metadata for an object in the plasma store. |
arrow_plasma_read | Reads the data from a table in the in-memory plasma store. |
arrow_plasma_delete | Deletes an object from the plasma store. |
arrow_plasma_size
This function determines the size of the buffer to request from the plasma-store. The syntax for calling the function is the following:
arrow_plasma_size()
The function returns:
(plasma_size : long)
This value is the size required to store the object in plasma-store.
Query Example:
PREFIX exfun: <http://cambridgesemantics.com/anzograph/inmemory#> SELECT ?plasma_size FROM <tickit> WHERE { { SELECT ?ticketPrice ?numtickets ?totalPrice WHERE { ?sell <priceperticket> ?ticketPrice; <numtickets> ?numtickets; <totalprice> ?totalPrice . } } SERVICE exfun:arrow_plasma_size() {} }
Results from the query appear similar to the following:
plasma_size ---------- 4037824
arrow_plasma_write
This function writes the result of an AnzoGraph query directly into an in-memory plasma table. The syntax for calling the function is the following:
arrow_plasma_write(plasma_path : String, object_id : String)
Function arguments are the following:
Argument | Data Type | Description |
---|---|---|
plasma_path | String
|
Name of the UNIX domain socket to use to connect to the Plasma store. |
object_id | String
|
Unique ID (a byte string of length 20) of the object to create into the plasma store as arrow table structure. |
The function returns:
(num_rows : long, object_id : String)
Return | Data Type | Description |
---|---|---|
num_rows | long
|
Number of rows stored for a table into the plasma store. |
object_id | String
|
Unique ID (a byte string of length 20) of the object to create into the plasma store as arrow table structure. |
Query Example:
PREFIX exfun: <http://cambridgesemantics.com/anzograph/inmemory#> SELECT ?num_rows ?object_id FROM <tickit> WHERE { { SELECT ?plasma_size WHERE { { SELECT ?ticketPrice ?numtickets ?totalPrice WHERE { ?sell <priceperticket> ?ticketPrice; <numtickets> ?numtickets; <totalprice> ?totalPrice . } } SERVICE exfun:arrow_plasma_size() {} } } { SELECT ?ticketPrice ?numtickets ?totalPrice WHERE { ?sell <priceperticket> ?ticketPrice; <numtickets> ?numtickets; <totalprice> ?totalPrice . } } SERVICE exfun:arrow_plasma_write( "/tmp/plasma", #plasma store path "0000000000000000001a" #ObjectID to write) {} }
Results from the query appear similar to the following:
num_rows | object_id ----------+--------------------------- 192497 | 0000000000000000001a
arrow_plasma_list
This function lists all the objects in the plasma store. The syntax for calling the function is the following:
arrow_plasma_list(plasma_path : String)
Function syntax arguments are the following:
Argument | Data Type | Description |
---|---|---|
plasma_path | String
|
Name of the UNIX domain socket to use to connect to the Plasma store. |
The function returns :
(object_id : String, metadata_size : long, data_size : long, ref_count : int, create_time : long, construct_duration : long, state : String)
Return | Data Type | Description |
---|---|---|
object_id | String
|
Unique ID of the object. |
metadata_size | long
|
Size of the object metadata in bytes. |
data_size | long
|
Size of the object data in bytes. |
ref_count | int
|
Number of clients referencing the object buffer. |
create_time | long
|
Unix timestamp of the object creation. |
construct_duration | long
|
Object creation time in seconds. |
state | String
|
Indicates whether the object still being created or it is already sealed. |
Query Example:
PREFIX exfun: <http://cambridgesemantics.com/anzograph/inmemory#> SELECT * WHERE { SERVICE exfun:arrow_plasma_list("/tmp/plasma") {} }
Results from the query appear similar to the following:
object_id | metadata_size | data_size | ref_count | create_time | construct_duration | state ---------------------+---------------+-----------+-----------+--------------+--------------------+-------- 0000000000000000001a | 0 | 4037824 | 0 | 1585044544 | 4 | SEALED
arrow_plasma_contains
This function checks if an object exist in the plasma store. The syntax for calling the function is the following:
arrow_plasma_contains(path : String, object_id : String)
Function arguments are the following:
Argument | Data Type | Description |
---|---|---|
path | String
|
Name of the UNIX domain socket to use to connect to the Plasma store. |
object_id | String
|
Unique ID of the object. |
The function returns:
(r : Boolean)
The value returned is True if the plasma store contains a particular object and the object has been sealed.
Query Example:
PREFIX exfun: <http://cambridgesemantics.com/anzograph/inmemory#> SELECT (exfun:arrow_plasma_contains("/tmp/plasma", "0000000000000000001a") as ?is_exist)
Results from the query appear similar to the following:
is_exist ------- t
arrow_plasma_schema
This function returns the table schema and metadata for an object in the plasma store. The syntax for calling the function is the following:
arrow_plasma_schema(path : String, object_id : String)
Function arguments are the following:
Argument | Data Type | Description |
---|---|---|
path | String
|
Name of the UNIX domain socket to use to connect to the Plasma store. |
object_id | String
|
Unique ID of the object.. |
The function returns:
(s : String, m : String)
Return | Data Type | Description |
---|---|---|
s | String
|
Table's schema . |
m | String
|
Table's metadata. |
Query Example:
PREFIX exfun: <http://cambridgesemantics.com/anzograph/inmemory#> SELECT (row_opt_string(?plasma_schema,2,0) as ?schema) (row_opt_string(?plasma_schema,2,1) as ?metadata) WHERE { SELECT (exfun:arrow_plasma_schema("/tmp/plasma", "0000000000000000001a") as ?plasma_schema) }
Results from the query appear similar to the following:
schema | metadata -------------------------------------------------------+---------- ?ticketPrice:double ?numtickets:int ?totalPrice:double |
arrow_plasma_read
This function reads the data from a table in the in-memory plasma store. The syntax for calling the function is the following:
arrow_plasma_read(plasma_path : String, object_id : String, col_name_type : String)
Function syntax arguments are the following:
Argument | Data Type | Description |
---|---|---|
plasma_path | String
|
Name of the UNIX domain socket to use to connect to the Plasma store. |
object_id | String
|
Unique ID(a byte string of length 20) of the object to read from the plasma store as a arrow table structure. |
col_name_type | String
|
Columns name and type of the table to read; string format is like "name1:type1 name2:type2". |
Query Example:
PREFIX exfun: <http://cambridgesemantics.com/anzograph/inmemory#> SELECT * WHERE { SERVICE exfun:arrow_plasma_read ( "/tmp/plasma", #plasma store path "0000000000000000001a", #objectID to read "?ticketPrice:double ?numtickets:int ?totalPrice:double" #column names and type ) {} } ORDER By ?ticketPrice ?numtickets LIMIT 5
Results from the query appear similar to the following:
ticketPrice | numtickets | totalPrice -------------+-------------+------------ 20.000000 | 1 | 20.000000 20.000000 | 1 | 20.000000 20.000000 | 1 | 20.000000 20.000000 | 1 | 20.000000 20.000000 | 1 | 20.000000
arrow_plasma_delete
This function deletes an object from the plasma store. The syntax for calling the function is the following:
arrow_plasma_delete(plasma_path : String, object_id : String)
Function arguments are the following:
Argument | Data Type | Description |
---|---|---|
plasma_path | String
|
Name of the UNIX domain socket to use to connect to the Plasma store. |
object_id | String
|
Unique ID of the object. |
The function returns :
(r : Boolean)
Query Example:
PREFIX exfun: <http://cambridgesemantics.com/anzograph/inmemory#> SELECT (exfun:arrow_plasma_delete("/tmp/plasma", "0000000000000000001a") as ?is_deleted)
Results from the query appear similar to the following:
is_deleted ------- t