Server and Cluster Requirements
The following sections detail the minimum requirements and recommendations to follow for setting up AnzoGraph host servers and cluster environments.
Cambridge Semantics lists above average production system hardware requirements as a guideline. Large production data sets running interactive queries may require significantly more powerful hardware and RAM configurations. Provision production server hardware accordingly to avoid performance issues.
|Available RAM||16 GB (for small-scale testing only)||200 GB or more||AnzoGraph needs enough RAM to store data, intermediate query results, and run the server processes. Cambridge Semantics recommends that you allocate 3 to 4 times as much RAM as the planned data size. Do not overcommit RAM on a VM or on the hypervisor/container host. Avoid memory paging to disk (swapping) to achieve the highest possible level of performance. For more information about determining the server and cluster size that is ideal for hosting AnzoGraph, see Server and Cluster Sizing and Capacity Planning Guidelines.|
|Disk space and type||10 GB HDD||
200+ GB SSD
|AnzoGraph requires 10 GB for internal requirements. The amount of additional disk space required for load file staging, persistence, or backups depends on the size of the data to be loaded. For persistence, Cambridge Semantics recommends that you have twice as much disk space on the local AnzoGraph file system as RAM on the server.|
|CPU (Intel only)||2 cores||32 cores||Once you provision sufficient RAM and a high-performing I/O subsystem, performance depends on raw CPU capabilities. Always use multi-core CPUs. A greater number of cores can make a dramatic difference in the performance of interactive queries. AnzoGraph does not support AMD processors at this time.|
|Networking||1gE||10gE, 40gE, InfiniBand||Not applicable for single server installations. Since AnzoGraph is high performance computing (HPC) Massively Parallel Processing (MPP) OLAP engine, inter-cluster communications bandwidth dramatically affects performance. AnzoGraph clusters require optimal network bandwidth. In a switched network, make sure that all NICs link to the same Top Of Rack or Full-Crossbar Modular switch. If possible, enable SR-IOV and other HW acceleration methods and dedicated layer 2 networking that guarantees bandwidth. Cambridge Semantics recommends 10 Gbit/s or better. Also use static IP addresses or make sure that DHCP leases are persistent.|
The table below lists the software requirements for AnzoGraph servers.
|Operating System||RHEL 7.4+, CentOS 7.4+||RHEL/CentOS 8 is not supported at this time.|
|GNU Compiler Collection||Installed||Install the latest version of the GCC tools for your operating system.|
|vim||Editor for creating or changing files.|
|sudo||Enables users to run programs with alternate security privileges.|
|psutils||Python system and process utilities for retrieving information on running processes and system usage.|
|tuned||Linux system service to apply tunables.|
|wget||Utility for downloading files over a network.|
|Google SDK||For virtual servers on Google Cloud Engine (GCE). Command line tool to enable syncing of data from Google storage. You can download the latest version from Google: https://cloud.google.com/sdk/.|
Virtual Environments and Clusters
AnzoGraph requires that all elements of the infrastructure provide the same quality of service. Do not run AnzoGraph on the same server as any other software except when in single-server mode and with an expectation of lowered performance. Providing the same quality of service on each server is important when using AnzoGraph in a cluster configuration. If any of the servers in the cluster host other applications or perform additional processing, the cluster becomes unbalanced and may perform poorly. A single poor performing server degrades the other servers to the same performance level. All nodes require the same hardware specification and configuration.
To ensure the maximum and most reliable quality of service for CPU, memory, and network bandwidth, do not co-locate other virtual machines or containers (such as Docker containers) on the same hypervisor or container host. For hypervisor-managed VMs, configure the hypervisor to reserve the available memory for the AnzoGraph server. For clusters, make sure there is enough physical RAM to support all of the AnzoGraph servers, and reserve the memory via the hypervisor.
In addition, running memory compacting services such as Kernel Same-page Merging (KSM) impacts CPU quality of service significantly and does not benefit AnzoGraph. Live migrations also impact the performance of VMs while they get migrated. While live migration can provide value for planned host maintenance, AnzoGraph performance may be impacted if live migrations occur frequently. For more information about Kernel Same-page Merging, see https://en.wikipedia.org/wiki/Kernel_same-page_merging.
Cambridge Semantics can provide benchmarks to establish relative cluster performance metrics and validate the environment.
Note: Advanced configurations may benefit from CPU pinning on the hypervisor host and disabling CPU hyper-threading. For more information about CPU pinning, see https://en.wikipedia.org/wiki/Processor_affinity. For information about hyper-threading, see https://en.wikipedia.org/wiki/Hyper-threading.
Tuning User Resource Limitations (ulimits)
Cambridge Semantics recommends that you tune the ulimits for your Linux distribution to increase the limits for certain resources. The list below describes the recommendations:
- Increase the open files limit to at least 4096.
- Increase the limit for the following resources to unlimited:
- cpu time
- file locks
- file size
- max memory size
- max user processes
- virtual memory
To view the current ulimits, run
ulimit -a. For example, the default ulimits for a CentOS 7.5 operating system are shown below:
core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 79607 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 4096 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited
To change the value for a resource, run the following command:
ulimit -option new_value
For example, the following command changes the open files value to unlimited:
ulimit -n unlimited
AnzoGraph servers communicate via TCP/IP sockets. AnzoGraph can communicate with applications via the secure, encrypted, gRPC-based protocol or the standard SPARQL HTTP(S) protocol.
Open the TCP ports listed in the table below. This image shows a visual representation of the communication ports:
|5700||gRPC protocol port for secure communication between AnzoGraph servers.||
|8256||SPARQL HTTPS port for SSL communication between applications and AnzoGraph.||
|7070||SPARQL HTTP port for communication between applications and AnzoGraph.||
|9100||The internal fabric communications port.||
|5600||The SSL system management port.||