AnzoGraph 2.1 Releases

To view the release notes for an AnzoGraph 2.1 version, select the version from the list below. The release notes for each version describe the product changes from the previous version.

AnzoGraph Version 2.1.10

This section describes the improvements and issues that were fixed in AnzoGraph Version 2.1.10.

Added Support for Using a Log Volume with Kubernetes Deployments

Version 2.1.10 adds support for attaching a separate log volume to Kubernetes-based AnzoGraph deployments. Capturing AnzoGraph logs, xrays, and crashdumps on a separate volume ensures that the files remain available for diagnostics in cases where the pods have been stopped, deprovisioned, or redeployed.

Improved Query Execution Planning for SERVICE Queries

Version 2.1.10 improves query execution planning for SERVICE queries by more accurately estimating the number of rows to broadcast.

File Size Setting was Ignored when Exporting Data to FLDS

Version 2.1.10 resolves an issue where AnzoGraph ignored the copy_file_size setting value and exported only one ttl.gz file per CPU (slice) when an Export Step was run. The resulting FLDS could include large RDF files that were not ideal for loading back into memory. In Version 2.1.10, AnzoGraph correctly creates .ttl.gz files according to the size specified in copy_file_size, which is 5 MB by default.

AnzoGraph Version 2.1.9

This section describes the improvements and issues that were fixed in AnzoGraph Version 2.1.9.

Improved Performance for Subqueries with Limits

Version 2.1.9 enhances the query planner to take into account subqueries with LIMIT clauses when estimating the number of projected rows for a query. This change results in improved query planning and leads to quicker execution times for many queries that include subqueries with limits.

AnzoGraph Version 2.1.8

This section describes the improvements and issues that were fixed in AnzoGraph Version 2.1.8.

CVE Found in Geospatial Extension Dependency

A Common Vulnerability and Exposures (CVE) issue was found in FasterXML Jackson Databind, which is used in the AnzoGraph Geospatial extension. The program did not properly secure entity expansion. The flaw was a vulnerability to XML external entity (XXE) attacks. The highest threat from the vulnerability was data integrity. AnzoGraph Version 2.1.8 uses FasterXML Jackson Databind version 2.11.0, which resolves the CVE.

This issue does not affect most Anzo deployments because the geospatial extension is not included by default in AnzoGraph "static" deployments that use the installer. The extension is included in dynamic, Kubernetes-based AnzoGraph deployments.

AnzoGraph Version 2.1.7

This section describes the improvements and issues that were fixed in AnzoGraph Version 2.1.7.

Reduced Data Profile Memory Usage

When a user profiles source or graph data in Anzo, the data to profile is loaded to AnzoGraph and AnzoGraph runs several metrics queries to compute statistics across the entire data set. Due to the nature of the queries and the broad scope of the data types that are targeted, the profile queries consumed a very large amount of RAM, especially on clusters that were larger than 4 nodes. Version 2.1.7 significantly reduces the RAM usage during data profiling so that the queries can succeed on large clusters.

As part of the resolution to this issue, the following settings were enabled by default in the <install_path>/config/settings.conf file:

  • streaming_sc: The value for this setting was changed from false to true to enable a feature called streaming scatter gather. Streaming scatter gather is an improvement to the way AnzoGraph converts data to its internal storage format when it is loaded and then translates data from internal storage format back to text when it is returned to clients. The process of converting data to internal storage format is called compression, and the process of converting data back to text format is called decompression.
  • sc_compress: The value for this setting was changed from false to true to enable the streaming scatter gather method of compression.
  • sc_decompress: The value for this setting was changed from false to true to enable the streaming scatter gather method of decompression.

Support Export of Property Graphs (RDF*)

If a data layer employed RDF* to define properties for relationships in the data, and an Export Step was used to export the property graph to a file-based linked data set (FLDS), all of the properties were excluded from the resulting FLDS. Version 2.1.7 resolves the issue so that properties are included when property graphs are copied to files.

Memory Exhausted Error when Parsing Large Query

When an extremely large query (775+ KB) was run, the parser failed to parse the query and returned a "Memory Exhausted" error. Version 2.1.7 resolves the issue by increasing the stack size for the parser.

Initial Queries Failed on 16+ Node Clusters

When a 16-node cluster was deployed, all of the initial internal queries failed and the cluster became unusable. The problem was due to the number of socket descriptors that were acquired for the HTTP endpoint. Version 2.1.7 resolves the issue by changing the method used to acquire socket descriptors.

AnzoGraph Version 2.1.6

This section describes the improvements and issues that were fixed in AnzoGraph Version 2.1.6.

Stability Improvements for Low Memory Conditions

Version 2.1.6 includes enhancements that improve the stability of AnzoGraph when it is operating in low memory conditions and reduce the likelihood that the AnzoGraph process will be terminated by the Linux out of memory killer.

To avoid unexpected shutdowns by the Linux operating system, the default AnzoGraph configuration leaves 20% of memory available for the operating system. That means AnzoGraph will not use more than 80% of the total available memory. In past releases there were circumstances in which the reserved memory percent could be lowered by half or more. However, Cambridge Semantics now recommends that all AnzoGraph deployments use the default configuration value of 20%.

Improved Memory Usage Reporting

In previous versions, the memory statistic that was sent to Anzo for memory usage reporting was the RSS value instead of the internal AnzoGraph usage value. This resulted in Anzo displaying a significantly higher memory usage value than what AnzoGraph was actually using. Version 2.1.6 corrects the issue for more accurate memory usage reporting.

Failed to Load Files in HDFS Subdirectories

In a previous version, when files were loaded to AnzoGraph from HDFS, only the files from the parent directory specified in the LOAD command were loaded. Files in child directories under the parent were ignored. Version 2.1.6 resolves the issue so that loads from HDFS include the files in subdirectories under the parent directory.

AnzoGraph Version 2.1.5

This section describes the improvements and issues that were fixed in AnzoGraph Version 2.1.5.

Memory Allocator Failed to Release Temporary Memory

As queries were executed against data in AnzoGraph, the memory allocator was accumulating memory but could fail to release all of it back to the operating system. This resulted in memory usage steadily increasing even though new data was not added. In some cases, AnzoGraph retained all of the available memory and was shut down by the operating system. Version 2.1.5 resolves the issue to ensure that the allocator releases memory when it is finished using it.

AnzoGraph Version 2.1.4

This section describes the improvements and issues that were fixed in AnzoGraph Version 2.1.4.

Inconsistent Results for DELETE Followed by INSERT

When a DELETE query was immediately followed by an INSERT query and the INSERT contained some of the same triples as the DELETE, the number of triples that were inserted could be inconsistent across runs. Version 2.1.4 fixes the issue by changing the way AnzoGraph handles the intermediate results produced by the WHERE clause in update queries. Previously the WHERE clause results were streamed. In Version 2.1.4, the results are materialized before further processing.

Materializing the intermediate results increases the amount of memory that is used for performing updates. Depending on the size of the result set produced by the WHERE clause, you may notice that INSERT and DELETE queries temporarily use more memory than they did before.

Inconsistent Results Across Environments

Version 2.1.4 resolves an issue where inconsistent results were returned when the same query was run on different clusters. One cluster returned a larger result set than the other. The issue occurred because on the environment where OWL statistics were enabled, a variable was incorrectly tagged as unique when it was not. Since it was considered unique, the DISTINCT operation was skipped in the COUNT calculation and too many results were returned.

Slower than Expected INSERT Performance

When comparing performance between a single server setup and cluster with the same total number of CPU, an INSERT query ran much slower on the single server. The slower performance was due to a mutex contention. Version 2.1.4 resolves the issue by reducing the time spent waiting on mutexes.

AnzoGraph Version 2.1.3

This section describes the improvements and issues that were fixed in AnzoGraph Version 2.1.3.

Query with Property Path Returned Incorrect Results

In certain circumstances, a property path query could return extra results or fail with an error. The problem occurred because it was possible for the planning stage of the query to distribute the incorrect column to the execution engine. Version 2.1.3 corrects the query plan for property path queries.

Crash after Out of Memory Error

When AnzoGraph canceled a query because there was insufficient memory available, there was a circumstance where AnzoGraph could try to free the same memory resource twice. The double-free of resources resulted in a crash. Version 2.1.3 resolves the issue by ensuring that AnzoGraph does not try to release the same resource more than once.

AnzoGraph Version 2.1.2

This section describes the improvements and issues that were fixed in AnzoGraph Version 2.1.2.

ASK Returns False if Graph does not Exist

In previous versions, ASK queries resulted in a "No such graph or view" or "Bad request" error if the query referenced a graph that did not exist. In Version 2.1.2, AnzoGraph returns "false" for ASK queries that reference non-existent graphs.

Option to Return Empty Result when Graph does not Exist

In previous versions, AnzoGraph returned a "No such graph or view" error and aborted the query if a query referenced a graph that did not exist. The system could not be configured to return an empty result instead of an error. In Version 2.1.2, referencing a non-existent graph still produces an error by default, but users have the option to configure AnzoGraph to return an empty result instead. To configure the system to return empty results instead of an error when a referenced graph does not exist, follow the instructions below to set the enable_unbound_variables value to true:

  1. Open install_path/config/settings.conf in a text editor.
  2. Locate the following line in the file:
    #enable_unbound_variables=false
  3. Uncomment the line and change the false value to true.
    enable_unbound_variables=true
  4. Save and close the file, and then restart AnzoGraph to apply the configuration change.

In addition to allowing queries that reference non-existent graphs to succeed, setting enable_unbound_variables to true also configures AnzoGraph to ignore unbound variables elsewhere in queries. For example, by default (when enable_unbound_variables=false), if a query includes a variable in the SELECT list that is not referenced in a WHERE clause pattern, AnzoGraph aborts the query and returns a "Named variable not in contained WHERE clause" error. When enable_unbound_variables=true, AnzoGraph will not warn the user about unbound variables. Instead, the results will be empty for the unbound variable. For example:

SELECT ?unbound ?person ?name
FROM <http://csi.com/people>
WHERE {?person <http://csi.com/people#firstname> ?name}
LIMIT 5
 unbound | person      | name
---------+-------------+---------
	 | person35632 | Ross
         | person20216 | Quin
         | person35859 | Kellie
	 | person2551  | Maris
	 | person24963 | Madonna
5 rows

Added Support for RDF Graph Store HTTP Protocol

Version 2.1.2 adds support for the SPARQL 1.1 Graph Store HTTP Protocol via the new rdf-graph-store endpoint. The AnzoGraph front end also supports the new graph store protocol via the data endpoint. The graph store protocol supports GET, POST, UPDATE, and DELETE HTTP methods. For more information, see Accessing AnzoGraph Using HTTP Protocol in the AnzoGraph DB Documentation.

Known Issue

While AnzoGraph now supports the Graph Store HTTP Protocol, it is a known issue that all return codes are not in accordance with the W3C recommendation. The following exceptions will be addressed in a later release:

  • If a GET request includes an invalid Accept value, the specification states that "406 Not Acceptable" should be returned. In Version 2.1.2, AnzoGraph returns "400 Bad Request."
  • For HTTP DELETE operations, the specification states that "404 Not Found" should be returned if a user deletes a graph that does not exist. In Version 2.1.2, AnzoGraph returns "400 Bad Request."
  • If a PUT or POST request includes an invalid Content-Type, the specification states that "415 Unsupported Media Type" should be returned. In Version 2.1.2, AnzoGraph returns "200 ok."

Unable to Recover from Out of Memory Exception

Running several complex queries concurrently could cause AnzoGraph to run out of memory and then fail to recover the memory after canceling the queries. The issue occurred when AnzoGraph was in the process of initializing the next query as the current query was receiving the out of memory exception. Version 2.1.2 resolves the issue to ensure that memory is recovered if a new query is started while another query hits an out of memory exception.

Crash if User without Write Permission Ran Graph 500 Benchmark

If a user ran the AnzoGraph Graph 500 Benchmark and did not have write permission to the directory where the benchmark generated data files, AnzoGraph crashed instead of returning a permission denied error to the user. In Version 2.1.2, AnzoGraph returns an error if the user does not have access to the directory where the data is generated, and the database remains running.

AnzoGraph Version 2.1.1

This section describes the improvements and issues that were fixed in AnzoGraph Version 2.1.1.

Retrieve boot.log when Generating Xrays

In previous versions, taking an xray did not retrieve the AnzoGraph boot.log diagnostic file (<install_path>/internal/log/boot.log), which includes information that is logged when the database is started and stopped. The boot.log was only retrieved when a crashdump was generated. In Version 2.1.1, boot.log is included in an xray as well as a crashdump.

Remove Extensions Directory on Startup

In Version 2.1.0, Cambridge Semantics introduced a new Java plugin that includes an Elasticsearch client for use with Anzo Unstructured. The plugin replaced the Elasticsearch client from AnzoGraph Version 2.0.1. When users upgraded to Version 2.1.0 from 2.0.1, AnzoGraph would not start because the <install_path>/internal/extensions directory contained files from the old client that were incompatible with the new plugin. In Version 2.1.1, AnzoGraph automatically removes the internal/extensions directory on startup. It then re-creates the directory as needed to ensure that it includes only files that are relevant to plugins in the <install_path>/lib/udx directory.

Failed to Bind Topdown Variable for Data Toolkit Service Query

In Version 2.1.0, if a user created a data layer to join existing data with the results from a data toolkit service call, the data layer query failed and AnzoGraph returned a "Bad extension invocation - cannot re-bind top-down variable '?varName'" error. Version 2.1.1 resolves the issue so that AnzoGraph can successfully complete data toolkit SERVICE queries.

Crash after Unstructured Document Text Search

In Version 2.1.0, AnzoGraph shut down when a user performed a text search via Elasticsearch. The shut down occurred during the hand-off between the Elasticsearch Java client and AnzoGraph client. Version 2.1.1 resolves the issue to ensure that AnzoGraph can successfully receive and process Elasticsearch requests.

Incorrect Results for Query with Optional Clause and Filter on Bind Clause Variable

In previous versions, a query that had an optional clause and a bind statement returned incorrect results when it also included a filter expression on the variable from the bind statement. Below is an example query that meets the criteria.

select ?name ?age ?phone
where { 
  ?v <name> ?name .
  ?v <age> ?age .
  optional { ?v <phone> ?phone }
  bind (?age as ?new_age)
  filter (?phone = 1234 && ?new_age > 10)
}

This query returned too many results because both of the filter expressions were not applied. Version 2.1.1 resolves the issue so that queries like the example above return the correct results.

Invalid Content-Type Response for Construct Queries

In previous versions, AnzoGraph sent the following invalid Content-Type HTTP header value in response to a CONSTRUCT query: application/sparql-results+text/plain. In Version 2.1.1, AnzoGraph sends an accurate Content-Type of "text/turtle; charset=utf-8."

Diagnostic Files Could Get Corrupted

In Version 2.1.0, there was a circumstance where a crashdump could be corrupted when multiple threads tried to create the diagnostic files at the same time. Version 2.1.1 resolves the issue to ensure that only one thread creates the crashdump.

AnzoGraph Version 2.1.0

This section describes the new features and changes to existing components that are introduced in AnzoGraph Version 2.1.0.

Improved Performance for Elasticsearch Service Calls

Version 2.1.0 enhances the performance of Elasticsearch queries by parallelizing the execution of the service calls.

Follow the instructions below to configure AnzoGraph for use with Elasticsearch.

Make sure that the azgmgrd and anzograph services are stopped before proceeding.

  1. AnzoGraph requires Java Development Kit version 11. Follow these steps to install OpenJDK 11 and set $JAVA_HOME to the Java Runtime Environment.

    If you upgraded from AnzoGraph Version 2.0.1 or 2.1.x and installed OpenJDK 11 previously, you do not need to complete this step.

    Perform these steps on all servers in the cluster:

    1. Run the following command to install OpenJDK 11:
      sudo yum install java-11-openjdk
    2. Modify the AnzoGraph system management service, azgmgrd.service, to set the $JAVA_HOME variable to the java runtime environment for the OpenJDK that you installed. To set the variable, add the following line to /usr/lib/systemd/system/azgmgrd.service:
      ENVIRONMENT=JAVA_HOME=/usr/lib/jvm/jre-11
  2. Copy the data toolkit plugin, datatoolkit-1.0.0.jar, provided by Cambridge Semantics to the <install_path>/lib/udx directory on the AnzoGraph leader server. This jar file contains the client interface that AnzoGraph uses to communicate with Elasticsearch. The AnzoGraph leader broadcasts the jar to the compute nodes when the database is started.

    If you upgraded from Version 2.0.1, remove elasticsearch-1.0.0.jar from the directory if it exists. Version 2.1.x does not use that .jar file.

  3. Make sure that the AnzoGraph leader server can access Elasticsearch via the Elasticsearch server's IP address. To test the connection, Cambridge Semantics recommends running the following telnet command from the AnzoGraph leader server:
    telnet <Elasticsearch_server_IP> 9200
  4. If you upgraded from AnzoGraph Version 2.0.1 and chose not to update the configuration settings, open the <install_path>/config/settings.conf file and look for enable_context_query_string=true. If the file contains that setting and it is enabled, remove the line from the file or change the value to false. The setting is not valid in Version 2.1.x.
  5. If you upgraded from Version 2.0.1, there may be JVM support libraries on the AnzoGraph file system that are incompatible with Version 2.1.x. Before starting AnzoGraph, follow the instructions below to remove the incompatible files.

    If you upgraded from AnzoGraph Version 2.1.x, you do not need to complete this step.

    Perform these steps on all servers in the cluster:

    1. Remove udx-1.5.0-SNAPSHOT.jar and udx-1.5.0-SNAPSHOT.javadoc.jar from the /<install_path>/lib/udx directory. For example:
      rm /opt/anzograph/lib/udx/udx-1.5.0-SNAPSHOT.jar
      rm /opt/anzograph/lib/udx/udx-1.5.0-SNAPSHOT.javadoc.jar
    2. Remove jvm-1.0.0-SNAPSHOT.jar and jvm-1.0.0-SNAPSHOT.javadoc.jar from the /<install_path>/lib/jar directory. For example:
      rm /opt/anzograph/lib/jar/jvm-1.0.0-SNAPSHOT.jar
      rm /opt/anzograph/lib/jar/jvm-1.0.0-SNAPSHOT.javadoc.jar
  6. Start the azgmgrd service on each AnzoGraph server:
    sudo systemctl start azgmgrd
  7. On the leader node, start the anzograph service:
    sudo systemctl start anzograph

Improved Memory Allocation Performance

In Version 2.1.0, AnzoGraph manages memory using a thread caching allocation strategy. The strategy is implemented with the industry standard tcmalloc library from Google. When compared to previous AnzoGraph versions, the thread caching allocator typically decreases lock contention time and overall query run time.

Default Configuration Values Commented Out in Settings File

To help distinguish between the default AnzoGraph configuration values and custom values in the configuration file <install_path>/config/settings.conf, Version 2.1.0 comments out the settings that are set to the default AnzoGraph values. Anzo-specific settings and user-modified settings, such as settings that are added at install time, are not commented. For information about changing settings, see Changing AnzoGraph Configuration Settings.

Ability to Configure the System Management Port (5600)

In previous versions, the port for the system management daemon (azgmgrd) was set to 5600 and could not be changed. If an environment could not use that port, users had to remember to specify -port <alternate_port> any time azgmgrd was started or an azgctl command was run. In Version 2.1.0, the <install_path>/config/settings.conf file includes a sysmgr_port setting. The value is set to 5600 by default. To use a port other than 5600, uncomment the sysmgr_port setting and change the value to the desired port.

Changing sysmgr_port requires a restart of the system management daemon, azgmgrd, as well as the database.

Beta Release of Database Auto-Restart Feature

Version 2.1.0 includes a beta release of the database auto-restart feature. When the feature is enabled, the AnzoGraph system manager automatically restarts the database after a crash. The feature is disabled by default, and Cambridge Semantics recommends that you enable it only on test systems for the 2.1.0 release. The feature is controlled by the following two new settings in <install_path>/config/settings.conf:

  • auto_restart_max_attempts=0: This setting specifies the number of times the system manager should attempt to start the database after a crash. The default value of 0 disables auto-restart.
  • auto_restart_time=600: This setting specifies the number of seconds to spend attempting to restart the database. If all attempts fail and this time limit is reached, the system manager stops trying to restart the database.

Changing the auto_restart settings requires a restart of the system management daemon, azgmgrd, as well as the database.

Reduced Initial Query Compilation Time

In previous versions, each time a new query was run, AnzoGraph performed an extensive code compilation process to generate the most optimal code for running that query. Once the code compilation was complete, AnzoGraph executed the query using that code. In Version 2.1.0, when a new query is run, AnzoGraph compiles basic, non-optimized code and immediately executes the query using that code. The optimized compilation process continues in the background, and the optimized code is used for subsequent runs of the query. In most cases, this change reduces the execution time for the first run of a query.

In Version 2.1.0, the compile_optimized configuration setting was changed from true to background.

OWL Statistics Enabled by Default

In order to generate query execution plans, AnzoGraph needs to gather statistics about the data. In previous versions, AnzoGraph captured basic statistics, such as the number of triples per graph and number of distinct subjects and predicates, by default. To aid in generating more optimal plans, users could enable more extensive statistics gathering, called OWL stats, which uses the metadata from data models to generate statistics. In Version 2.1.0, OWL stats are enabled by default. The feature is controlled by the enable_owlstats setting in <install_path>/config/settings.conf. When enable_owlstats is false (disabled), AnzoGraph reverts to capturing basic statistics.

Support Expressions with Median and Percentile Functions

Version 2.1.0 adds support for including expressions as arguments to MEDIAN and PERCENTILE functions. Previously AnzoGraph displayed an "Invalid System State" error message if an expression was input to one of those functions.

Display Error if Namespace Bindings Missing in INSERT

In previous versions, AnzoGraph did not produce an error if an INSERT query included triple patterns where some elements excluded namespace prefix bindings. SELECT queries with the same patterns, however, did correctly result in an error. In Version 2.1.0 INSERT queries with undefined prefixes result in the appropriate error. For example, the following query

INSERT DATA { 
  :John a :Person ; 
  :name "John Doe" .
}

Results in an error such as:

:John: URI has blank prefix, but no (PREFIX or BASE) namespace defined

Project Expressions in SELECT DISTINCT Clause

In previous versions, if a query had a SELECT DISTINCT clause with expressions that included functions (such as STR and STRAFTER) and the query did not have a GROUP BY clause, AnzoGraph could fail to project results and return an "Invalid System State" error. Version 2.1.0 fixes the issue so that AnzoGraph properly projects SELECT DISTINCT expressions that contain functions.

Display Error for Unbound Variables in WHERE Clause

In previous versions, if an INSERT query had unbound variables in the WHERE clause, AnzoGraph could incorrectly insert additional triples for the unbound variable instead of discarding it. In Version 2.1.0, by default AnzoGraph displays an error message if an INSERT query includes unbound variables. To change the behavior so that AnzoGraph discards unbound variables and inserts only the bound values, edit the <install_path>/config/settings.conf file to change the value of enable_unbound_variables from false to true.

Support for Limits in Subqueries

In previous versions, AnzoGraph returned an "Invalid System State" error if a subquery included a LIMIT clause. Version 2.1.0 resolves the issue to ensure that queries that include subqueries with LIMIT clauses can complete successfully.

Return an Error for a Query with an Empty Where Clause

In previous versions, AnzoGraph crashed if a query with an empty WHERE clause was run. Version 2.1.0 resolves the issue so that AnzoGraph returns an error message and remains online if a user runs a query with an empty WHERE clause.