Sizing Guidelines for Disk-Based Storage

For fast performance and scalability, AnzoGraph DB stores all data in memory. Data is persisted to disk as a backup and so that graphs are automatically reloaded into memory when AnzoGraph DB is restarted, but queries do not access the data on disk since all of the data is cached in memory. And accessing data in memory is much faster than retrieving data from disk.

When deploying large memory-optimized servers for fast query performance is not feasible, however, AnzoGraph DB can be configured to operate as a disk-based graph database. In this configuration (called "Paged Data"), data is loaded to AnzoGraph DB, converted to AnzoGraph DB's internal storage format, and persisted to disk without being retained in memory. Data is then paged into memory from disk as requested for analytic operations. For details about database operations in paged data mode, see Enabling Paged Data Mode (Preview).

The table below lists the disk and memory sizing requirements and guidelines to follow if you are considering enabling disk-based storage. For software and firewall requirements, see Server and Cluster Requirements.

Hardware Requirements

Component Recommendation Guidelines
RAM 100+ GB
  • Though all graph data is stored on disk, RAM is required to hold intermediate results when performing computations and joins.
  • Having more RAM available for paged data caching can reduce the frequency with which AnzoGraph DB swaps data from disk to memory. More data can remain paged in memory for access during query execution.
  • The amount of data you can expect to be able to store is about 3X the size of RAM. For example, with 200 GB of RAM, you can load and query about 600 GB of data on disk.
Disk Size 500+ GB The disk size should be at least 4X the size of the data at rest. For example, loading 1 TB of data requires a 4 TB disk to support paging operations.
Disk Type SSD The speed of the disk that hosts the persisted data has an impact on query performance. For the best performance, store the persistence directory on a fast disk, such as SSD. You can relocate the default persistence directory from the AnzoGraph DB file system to a separate location. See Relocating AnzoGraph DB Directories for more information.

Cambridge Semantics recommends that you do not store persisted data on a NFS mounted disk due to the network overhead that is introduced.

CPU 32 A greater number of multi-core CPU with a high clock speed can make a dramatic difference in the performance of paged data queries.

Intel processors are preferred, but AnzoGraph DB is supported on newer Epyc AMD processors. Older AMD processors are not supported.

Ultimately, queries perform significantly slower when data is stored on disk versus in memory. If fast performance is a requirement, data should be stored in-memory, and configuring AnzoGraph DB for paged data operations should not be considered. For more information, see Enabling Paged Data Mode (Preview).

Related Topics