Enabling Paged Data Mode (Preview)

By default, AnzoGraph is configured as an in-memory database. In memory mode, all graphs are stored in memory and all queries are run against the data in memory. Data is persisted to disk only for backup purposes as well as automatic loading of graphs back into memory when the database is restarted. You have the option, however, to configure AnzoGraph as a disk-based database, where all of the data is stored on disk and then paged into memory on-demand for running analytics.

The Paged Data feature is a Preview release, which means the implementation has recently been completed but is not yet thoroughly tested and could be unstable. The feature is available for trial usage, but Cambridge Semantics recommends that you do not rely on Preview features in production environments.

How Does Paged Data Mode Work?

The procedure below gives an overview of how AnzoGraph operates in paged data mode:

  1. First, just like in-memory mode, you load data into AnzoGraph before running queries.
  2. As data is loaded, it passes through memory to be converted to AnzoGraph's internal storage format, and then it is saved to disk in the persistence directory. The persistence directory location is configurable, and the speed of the disk that hosts the directory has an impact on query performance. For the best performance, store the persistence directory on a fast disk, such as SSD.
  3. AnzoGraph keeps the most recently accessed data cached in memory for queries. By default, the size of the cache is 20% of the total available memory. The percentage of memory to use for paged data caching is configurable. For more information, see paged_cache_memory_percent.
  4. As queries are run, AnzoGraph keeps track of the data that is accessed most often and keeps that data cached in memory. If a query requests data that is not currently cached, AnzoGraph releases the least accessed data from memory and loads the relevant data into memory.

Enabling and Configuring Paged Data Mode

Follow the steps below to configure AnzoGraph for paged data storage. Before changing the configuration, make sure that your environment meets the requirements in Sizing Guidelines for Disk-Based Storage (Preview).

Though enabling paged data does not change the way users interact with the database, i.e., data loading and query operations remain the same, the performance of user operations will likely be slower compared to the default in-memory operation. In addition, enabling paged data requires you to re-initialize the database to remove the existing persistence. The data will need to be reloaded from the original files and/or queries.

  1. Stop the database. See Stop the Database and Leave the System Management Daemon Running for instructions.
  2. On the leader node, open the AnzoGraph settings file, settings.conf, in a text editor. The file is in the <install_path>/config directory.
  3. In settings.conf, locate the # paged_data=false line. This setting enables and disables paged data storage. Uncomment the line and change the value to true to enable paged data.
    paged_data=true
  4. The following settings are also related to paged data operations. If necessary, uncomment the lines for any of these settings and modify the values as needed:
    • paged_cache_memory_percent: This setting controls the amount of memory (as a percentage of total memory) to use for caching the most often accessed data. The default value is 20, which means AnzoGraph is configured to use 20% of the total available memory for caching data for analytics. If a query requests data that is not currently cached, AnzoGraph releases the least used data from memory and loads the relevant data into memory.

      Cambridge Semantics recommends that you do not set this value higher than 30.

    • enable_persistence: Persistence must be enabled when using paged data mode. This setting is true by default. If you have disabled persistence, re-enable it. 
    • persistence_directory: The directory where AnzoGraph saves the data that is persisted to disk. By default, the data is saved in the <install_path>/persistence directory. To persist data to an alternate disk, such as a separate SSD, specify the path and directory name.
  5. Save and close settings.conf.
  6. Restart and re-initialize the database to apply the configuration change and remove any existing persisted data. See Reinitializing the Database for instructions. When AnzoGraph starts, reload the database from your original files or insert queries.
Related Topics