AnzoGraph Requirements

This topic lists the minimum requirements and recommendations to follow for setting up AnzoGraph host servers and cluster environments.

For information about AnzoGraph server and cluster sizing, see AnzoGraph Server and Cluster Sizing Guidelines.

Hardware Requirements

Cambridge Semantics lists above average production system hardware requirements as a guideline. Large production data sets running interactive queries may require significantly more powerful hardware and RAM configurations. Provision production server hardware accordingly to avoid performance issues.

Component Minimum Recommended Guidelines
Available RAM 16 GB (for small-scale testing only) 200 GB or more AnzoGraph needs enough RAM to store data, intermediate query results, and run the server processes. Cambridge Semantics recommends that you allocate 3 to 4 times as much RAM as the planned data size. Do not overcommit RAM on a VM or on the hypervisor/container host. Avoid memory paging to disk (swapping) to achieve the highest possible level of performance. For more information about determining the server and cluster size that is ideal for hosting AnzoGraph, see AnzoGraph Server and Cluster Sizing Guidelines.
Disk space and type 10 GB HDD

200+ GB SSD

AnzoGraph requires 10 GB for internal requirements. The amount of additional disk space required for load file staging, data persistence, or backups depends on the size of the data to be loaded. For persistence, Cambridge Semantics recommends that you have twice as much disk space on the local AnzoGraph file system as RAM on the server.
CPU (Intel only) 2 cores 32 cores Once you provision sufficient RAM and a high-performing I/O subsystem, performance depends on raw CPU capabilities. A greater number of cores can make a dramatic difference in the performance of interactive queries. Do not overcommit CPU cores so that the sum of vCores becomes greater than the number of real CPU cores. AnzoGraph does not support AMD processors at this time.
Networking 1gE 10gE, 40gE, InfiniBand Not applicable for single server installations. Since AnzoGraph is high performance computing (HPC) Massively Parallel Processing (MPP) OLAP engine, inter-cluster communications bandwidth dramatically affects performance. AnzoGraph clusters require optimal network bandwidth. In a switched network, make sure that all NICs link to the same Top Of Rack or Full-Crossbar Modular switch. If possible, enable SR-IOV and other HW acceleration methods and dedicated layer 2 networking that guarantees bandwidth. Cambridge Semantics recommends 10 Gbit/s or better. Also use static IP addresses or make sure that DHCP leases are persistent.

Software Requirements

The table below lists the software requirements for AnzoGraph servers.

Component Minimum Recommended Guidelines
Operating System RHEL6, CentOS6 RHEL7+, CentOS7+  
GNU Compiler Collection N/A Installed Install the latest version of the GCC tools for your operating system.
bzip2 N/A Installed Required for unpacking the AnzoGraph tool set during installation.
Linux Configuration   The default configuration values for some Linux parameters can severely degrade AnzoGraph performance. See Required Linux Kernel Configuration Changes below for information about revising the values automatically any time the AnzoGraph host server is restarted.

Optional Software

Program Description
vim Editor for creating or changing files.
sudo Enables users to run programs with alternate security privileges, such as the azg_system_config utility that configures the required AnzoGraph kernel settings.
net-tools Networking utilities.
psutils Python system and process utilities for retrieving information on running processes and system usage.
tuned Linux system service to apply tunables.
wget Utility for downloading files over a network.
Google SDK For virtual servers on Google Cloud Engine (GCE). Command line tool to enable syncing of data from Google storage. You can download the latest version from Google: https://cloud.google.com/sdk/.

Clusters and Virtual Environments

AnzoGraph requires that all elements of the infrastructure provide the same quality of service (QoS). Do not run AnzoGraph on the same server as any other software except when in single-server mode and with an expectation of lowered performance. Providing the same QoS is especially important when using AnzoGraph in a clustered configuration. If any of the servers in the cluster perform additional processing, the cluster becomes unbalanced and may perform poorly. A single poor performing server degrades the other servers to the same performance level. All nodes require the same hardware specification and configuration.

To ensure the maximum and most reliable QoS for CPU, memory, and network bandwidth, do not co-locate other virtual machines or containers (such as Docker containers) on the same hypervisor or container host. For hypervisor-managed VMs, configure the hypervisor to reserve the available memory for the AnzoGraph server. For clusters, make sure there is enough physical RAM to support all of the AnzoGraph servers, and reserve the memory via the hypervisor.

In addition, running memory compacting services such as Kernel Same-page Merging (KSM) impacts CPU QoS significantly and does not benefit AnzoGraph. Live migrations also impact the performance of VMs while they get migrated. While live migration can provide value for planned host maintenance, AnzoGraph performance may be impacted if live migrations occur frequently. For more information about Kernel Same-page Merging, see https://en.wikipedia.org/wiki/Kernel_same-page_merging.

Cambridge Semantics can provide benchmarks to establish relative cluster performance metrics and validate the environment.

Note: Advanced configurations may benefit from CPU pinning on the hypervisor host and disabling CPU hyper-threading. For more information about CPU pinning, see https://en.wikipedia.org/wiki/Processor_affinity. For information about hyper-threading, see https://en.wikipedia.org/wiki/Hyper-threading.

Required Linux Kernel Configuration Changes

The default Linux configuration values for the following settings can severely degrade AnzoGraph performance or cause errors:

  • transparent_hugepage: Enables or disables the Transparent Huge Pages (THP) feature. AnzoGraph requires THP to be disabled.
  • max_map_count: Configures the maximum number of memory map areas a process can use. AnzoGraph requires a value of 2097152.

Note: In addition to changing the Linux configuration values above, make sure that the Linux environment variables http_proxy and https_proxy are not set. The Anzo gRPC protocol cannot make connections to the database when proxies are enabled.

Applying Linux Configuration Changes Automatically via systemd

This section provides instructions and a sample systemd configuration file that you can use to override the default values and make sure that the Linux kernel changes are re-applied automatically any time an AnzoGraph host server is restarted.

Note: The example steps below are performed after AnzoGraph is installed. The procedure uses a systemd service to run the AnzoGraph system configuration utility that is included in the installation.

Note: Complete the steps below as the root user. On AnzoGraph clusters, create the systemd service on each node.

  1. Run the following command to copy the AnzoGraph system configuration script, azg_system_config, to the root directory:
    # cp /install_path/gqe/bin/azg_system_config /root/
  2. Run the following command to remove "sudo" from the azg_system_config script:
    # sed -i 's/sudo//g' /root/azg_system_config
  3. Create a file called azg_system_config.service in the /usr/lib/systemd/system directory. For example:
    # cat > /usr/lib/systemd/system/azg_system_config.service <<EOF
  4. Add the following contents to azg_system_config.service:
    [Unit]
    Description=Configure Linux for AnzoGraph
    [Service]
    Type=oneshot
    ExecStart=/root/azg_system_config
    [Install]
    WantedBy=multi-user.target
    EOF
  5. Press Enter if you followed the example above or save and close the file if you opened it in an editor.
  6. Run the following commands to start and enable the new service:
    # systemctl start azg_system_config.service
    # systemctl enable azg_system_config.service

Tuning User Resource Limitations (ulimits)

Cambridge Semantics recommends that you tune the ulimits for your Linux distribution to increase the limits for certain resources. The list below describes the recommendations:

  • Increase the open files limit to at least 4096.
  • Increase the limit for the following resources to unlimited:
    • cpu time
    • file locks
    • file size
    • max memory size
    • max user processes
    • virtual memory

To view the current ulimits, run ulimit -a. For example, the default ulimits for a CentOS 7.5 operating system are shown below:

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 79607
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 4096
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

To change the value for a resource, run the following command:

ulimit -option new_value

For example, the following command changes the open files value to unlimited:

ulimit -n unlimited

Firewall Requirements

AnzoGraph servers communicate via TCP/IP sockets. AnzoGraph communicates with Anzo via the secure, encrypted, gRPC-based Anzo protocol. Since AnzoGraph is SPARQL-compliant, you also have the option to use standard SPARQL HTTP protocol for communication. SPARQL HTTP communication is not secure, but you can set up an NGINX SSL authentication proxy when you install AnzoGraph.

Open the TCP ports listed in the table below. This image shows a visual representation of the communication ports:

Port Description Access Needed...
9100 AnzoGraph's internal fabric communications port
  • Between all AnzoGraph servers in a cluster
  • Available for AnzoGraph on single node installations
5600 AnzoGraph's SSL system management port
  • Between Anzo and the AnzoGraph leader server
  • Between all AnzoGraph servers in the cluster
  • Available for AnzoGraph on single node installations
5700 The Anzo protocol (gRPC) port for secure communication between AnzoGraph and Anzo
  • Between Anzo and the AnzoGraph leader server
  • Between all AnzoGraph servers in the cluster
  • Available for AnzoGraph on single node installations
8100 This port is used when Anzo loads many statements in parallel, such as when loading a large data model from Anzo to AnzoGraph
  • From Anzo to each of the servers in the AnzoGraph cluster
  • Between all AnzoGraph servers in the cluster
7070 (optional) Optional SPARQL service HTTP port to enable if you want to give external applications access to AnzoGraph
  • Between external applications and the AnzoGraph leader server
7071 (optional) Optional SSL AnzoGraph SPARQL service HTTPS port to enable if you set up SSL authentication and want to give external applications access to AnzoGraph
  • Between external applications and the AnzoGraph leader server
Related Topics