Sizing Guidelines for Disk-Based Storage (Preview)

For fast performance and scalability, AnzoGraph stores all data in memory. Data is persisted to disk as a backup and so that graphs are automatically reloaded into memory when AnzoGraph is restarted, but queries do not access the data on disk since all of the data is cached in memory. And accessing data in memory is much faster than retrieving data from disk.

When deploying large memory-optimized servers for fast query performance is not feasible, however, AnzoGraph can be configured to operate as a disk-based graph database. In this configuration (called "Paged Data"), data is loaded to AnzoGraph, converted to AnzoGraph's internal storage format, and persisted to disk without being retained in memory. Data is then paged into memory from disk as requested for analytic operations. For details about database operations in paged data mode, see Enabling Paged Data Mode (Preview).

The Paged Data feature is a Preview release, which means the implementation has recently been completed but is not yet thoroughly tested and could be unstable. The feature is available for trial usage, but Cambridge Semantics recommends that you do not rely on Preview features in production environments.

The table below lists the disk and memory sizing requirements and guidelines to follow if you are considering enabling disk-based storage.

Hardware Requirements

Component Recommendation Guidelines
RAM 100+ GB
  • Though all graph data is stored on disk, RAM is required to hold intermediate results when performing computations and joins.
  • Having more RAM available for paged data caching can reduce the frequency with which AnzoGraph swaps data from disk to memory. More data can remain paged in memory for access during query execution.
  • The amount of data you can expect to be able to store is about 3X the size of RAM. For example, with 200 GB of RAM, you can load and query about 600 GB of data on disk.
Disk Size 500+ GB The disk size should be at least 4X the size of the data at rest. For example, loading 1 TB of data requires a 4 TB disk to support paging operations.
Disk Type SSD The speed of the disk that hosts the persisted data has an impact on query performance. For the best performance, store the persistence directory on a fast disk, such as SSD. You can relocate the default persistence directory from the AnzoGraph file system to a separate location. See Relocating AnzoGraph Directories for more information.

Cambridge Semantics recommends that you do not store persisted data on a NFS mounted disk due to the network overhead that is introduced.

CPU 32 A greater number of multi-core CPU with a high clock speed can make a dramatic difference in the performance of paged data queries.

Intel processors are preferred, but AnzoGraph is supported on newer Epyc AMD processors. Older AMD processors are not supported.

For software and firewall requirements, see Server and Cluster Requirements.

Ultimately, queries perform significantly slower when data is stored on disk versus in memory. If fast performance is a requirement, data should be stored in-memory, and configuring AnzoGraph for paged data operations should not be considered. For more information, see Enabling Paged Data Mode (Preview).

Related Topics