Installing AnzoGraph on a Cluster

This topic provides instructions for installing AnzoGraph on a cluster. For information about server requirements, see AnzoGraph Requirements.

Important Because AnzoGraph offers features such as user-defined extensions, it is not secure software certified and should not be installed or run as the root user. In addition, since AnzoGraph will access the data that Anzo writes on the shared file store, it is important to install and run AnzoGraph with the same service account that runs Anzo. For more information, see Anzo Service User Account Requirements.

For instructions on installing AnzoGraph on a single server, see Installing AnzoGraph on a Single Server.

  1. Complete the Pre-Installation Configuration
  2. Install AnzoGraph on the Compute Servers
  3. Install AnzoGraph on the Leader Server
  4. Complete the Post-Installation Configuration

Complete the Pre-Installation Configuration

Install the Required Software

Install GCC and BZIP2 (Required for all Deployments)

Make sure that each server in the cluster has the following software packages installed. These packages are required for all deployments:

  • GNU Compiler Collection (GCC): AnzoGraph requires the latest version of the GCC tools for your operating system. Run the following command to install GCC:
    sudo yum install gcc
    NoteSpecifically, AnzoGraph requires the glibc, glibc-devel, and gcc-c++ libraries. Typically, when you install GCC by running yum install gcc, those libraries are included as part of the package. In rare cases, depending on the host server configuration, installing GCC excludes certain libraries. If AnzoGraph fails to start and you receive a "Compilation failed" message, it may indicate that some of the required libraries are missing. To install the missing libraries, run the following command:
    sudo yum install glibc glibc-devel gcc-c++
  • bzip2: Required for unpacking the AnzoGraph tool set during installation. Run the following command to install bzip2:
    sudo yum install bzip2

Install OpenJDK 11 (Required for Anzo Unstructured and Data Toolkit Service Deployments)

AnzoGraph uses a Java client interface (datatoolkit-<version>.jar) to communicate with data sources when the Data Toolkit service is used to incorporate data from remote endpoints into graphmarts. AnzoGraph also uses the Java client to communicate with Elasticsearch when Anzo Unstructured graphmarts are deployed. Java Development Kit version 11 is required for using the Java client. Follow the instructions below to install OpenJDK on each server in the cluster.

  1. Run the following command to install OpenJDK 11:
    sudo yum install java-11-openjdk
    NoteDo not set the $JAVA_HOME variable to use the JDK installation at this time. AnzoGraph's system management daemon requires JAVA_HOME, and it is set as part of the post-installation configuration. In addition, the Elasticsearch plugin is deployed after AnzoGraph is installed.
  2. If your organization uses Anzo Unstructured, test the connection between the AnzoGraph leader server and Elasticsearch. Make sure that Elasticsearch is running and then run the following telnet command:
    telnet <Elasticsearch_server_IP> <port>

    By default, the port range for Elasticsearch requests (http.port) is 9200-9300. If port 9200 is not available when Elasticsearch is started, Elasticsearch tries 9201 and so on until it finds an accessible port. Specify the HTTP request port that Elasticsearch is using.

Configure User Resource Limits

Cambridge Semantics recommends that you tune the user resource limits (ulimits) for your Linux distribution to increase the limits for the following resources:

  • Increase the open files limit to at least 4096.
  • Increase the limit for the following resources to unlimited:
    • cpu time
    • file locks
    • file size
    • max memory size
    • max user processes
    • virtual memory

To view the current ulimits, run ulimit -a. To permanently change ulimits, modify the /etc/security/limits.conf file. For more information, see How to set ulimit values in the RHEL support documentation.

Note Also make sure that the Linux environment variables http_proxy and https_proxy are not set. The Anzo gRPC protocol cannot make connections to the database when proxies are enabled.

Make note of the IP addresses for each of the servers in the cluster. The installation wizard will prompt you to enter the IP addresses during the installation. In addition, choose one server to be the leader server. Then proceed to the instructions below and install AnzoGraph on the compute servers.

Install AnzoGraph on the Compute Servers

Follow the instructions below to install AnzoGraph on each compute server.

Important Complete the steps below as the Anzo service user.
  1. If necessary, run the following command to become the Anzo service user:
    # su name

    Where name is the name of the service user. For example:

    # su anzo
  2. If necessary, run the following command to make the AnzoGraph installation script executable:
    chmod +x script_name
  3. Run the following command to start the installation wizard:
    ./script_name

    The script unpacks the JRE and then waits for input before starting the installation.

  4. Press Enter to proceed with the installation. The wizard displays the AnzoGraph license agreement.
  5. Review the license agreement. Press Enter to scroll through the terms. At the end of the agreement, type 1 to accept the terms or type 2 to disagree and stop the installation.
  6. The wizard prompts you to specify which components to install. Specify 1 (AnzoGraph) and press Enter.
  7. Specify the path and directory for the AnzoGraph installation. Specify the same location on each server. Press Enter to accept the default installation path or type an alternate path and then press Enter.
  8. At the server installation type prompt, specify option 3 (Cluster Slave) and press Enter.
  9. Indicate whether this installation is for use with Anzo. Press Enter for Yes. Answering yes configures AnzoGraph to use the settings that are optimal for Anzo. Answering no configures the settings that are optimal for AnzoGraph standalone use.
  10. Type a comma-separated list of the IP addresses for each server in the cluster. Type the leader server IP address first, followed by each compute IP address. For example, on a cluster with 4 servers where 192.168.2.1 is the leader server:
    192.168.2.1,192.168.2.2,192.168.2.3,192.168.2.4
    ImportantMake sure that you enter this value exactly the same, with IP addresses in the same order, during the installation on each server.
  11. After typing the list of IP addresses, press Enter. The wizard extracts the AnzoGraph files and completes the installation.
  12. Repeat the steps above to install AnzoGraph on each compute server. Then proceed to Install AnzoGraph on the Leader Server below.

Install AnzoGraph on the Leader Server

Follow the instructions below to install AnzoGraph on the leader server.

Important Complete the steps below as the Anzo service user.
  1. If necessary, run the following command to become the Anzo service user:
    # su name

    Where name is the name of the service user. For example:

    # su anzo
  2. If necessary, run the following command to make the AnzoGraph installation script executable:
    chmod +x script_name
  3. Run the following command to start the installation wizard:
    ./script_name

    The script unpacks the JRE and then waits for input before starting the installation.

  4. Press Enter to proceed with the installation. The wizard displays the AnzoGraph license agreement.
  5. Review the license agreement. Press Enter to scroll through the terms. At the end of the agreement, type 1 to accept the terms or type 2 to disagree and stop the installation.
  6. The wizard prompts you to specify which components to install. Specify 1 (AnzoGraph) and press Enter.
  7. Specify the path and directory for the AnzoGraph installation. Specify the same location as the compute server installations. Press Enter to accept the default installation path or type an alternate path and then press Enter.
  8. At the server installation type prompt, specify option 2 (Cluster Leader) and press Enter.
  9. Indicate whether this installation is for use with Anzo. Press Enter for Yes. Answering yes configures AnzoGraph to use the settings that are optimal for Anzo. Answering no configures the settings that are optimal for AnzoGraph standalone use.
  10. Set up the AnzoGraph admin user. Type a username to use for authentication. Anzo will use this username to connect to AnzoGraph. Then press Enter.
  11. Type a password for the Anzo username and press Enter. Note: Some special characters, such as $ and *, are treated as parameters in bash. When typing a password, avoid or escape special characters to remove their special meaning to the command line. For more information, see Quoting in the Bash Reference Manual.
  12. Type a comma-separated list of the IP addresses for each server in the cluster. Type the leader server IP address first, followed by each compute IP address. For example, on a cluster with 4 servers where 192.168.2.1 is the leader server:
    192.168.2.1,192.168.2.2,192.168.2.3,192.168.2.4
    ImportantMake sure that you enter this value exactly the same, with IP addresses in the same order, as the compute servers.
  13. After typing the list of IP addresses, press Enter. Configure any additional AnzoGraph settings. If Cambridge Semantics Support provided custom settings to use for your configuration, type the supplied values and then press Enter. Separate multiple settings with the new line escape sequence, \n. For example, the following entry sets two custom settings: truncate_clob=true\npersistence_directory=/data/.
    NoteIf you are installing AnzoGraph as the root user, add the following value to this prompt:
    enable_root_user=true
  14. The wizard extracts the AnzoGraph files and completes the installation. Proceed to Installing AnzoGraph on a Cluster below to complete the initial cluster configuration and start AnzoGraph.

Complete the Post-Installation Configuration

Deploy the Data Toolkit Plugin (Anzo Unstructured and Data Toolkit Service Environments)

If your organization uses Anzo Unstructured or the Data Toolkit Service, copy the datatoolkit-1.0.0.jar plugin provided by Cambridge Semantics to the <install_path>/lib/udx directory on the AnzoGraph leader server.

Configure and Start the AnzoGraph Services (All Environments)

There are three processes involved in the initial startup of AnzoGraph on a cluster. And subsequent starts involve one or more of these steps depending on the state of AnzoGraph and the servers:

  1. The first process involves the configuration of the Linux kernel and it applies to all servers in the cluster. The default kernel configuration for the following settings is not optimal for AnzoGraph:
    • transparent_hugepage: Transparent Huge Pages (THP) are enabled by default and can severely degrade AnzoGraph performance. THP should be disabled for AnzoGraph.
    • max_map_count: By default, the maximum number of memory map areas that a process can use is 65535. Since AnzoGraph is memory intensive, it may reach the maximum map count and be shut down by the operating system. AnzoGraph requires a value of 2097152.

    At startup, AnzoGraph checks these settings and returns a warning if the values are not suitable. You are required to make the kernel changes or configure AnzoGraph to start with non-optimal configurations. The AnzoGraph deployment includes a script (<install_path>/bin/azg_system_config) that makes the required kernel configuration changes. Superuser privileges are required to make the changes, however, and each time the host server is rebooted the script must be run again because the kernel configuration reverts to the defaults.

  2. The second process involves the AnzoGraph system management daemon, azgmgrd. This very lightweight program runs on all servers in the cluster and manages AnzoGraph communication between the nodes. It must be running to start the database, but it typically does not need to be restarted unless you are upgrading AnzoGraph or the host servers are rebooted. It does not need to be stopped and started each time the database is restarted.
  3. The third process involves starting the database with the system manager. Starting the database is done only on the leader server. The leader connects to the system managers on the compute servers and starts the database across the cluster.

To ensure that the right account/permissions are used to perform the three steps above (i.e., the root user makes the kernel changes and the Anzo service account starts the system management daemon and the database) whenever the host server is rebooted, Cambridge Semantics recommends that you configure services to run the startup steps. This section provides instructions for configuring the three services.

Important Configure the first two services, the Linux kernel configuration service and the AnzoGraph system management service, on all servers in the cluster. Configure the AnzoGraph database service only on the leader node.
Note Root user privileges are required to complete the tasks below.

Configure the Linux Kernel Configuration Service

On each server in the cluster, follow the instructions below to set up a service to apply the Linux kernel configuration changes any time the AnzoGraph host server is restarted.

Note If making the kernel changes is not possible, you can set the os_allow_alternate_vm_config value to true in the AnzoGraph settings file. This setting enables AnzoGraph to start with non-optimal Linux configurations. See Changing AnzoGraph Configuration Settings for instructions.
  1. Run the following command to copy the AnzoGraph system configuration script, azg_system_config, to the root directory:
    # cp /install_path/bin/azg_system_config /root/

    For example:

    # cp /opt/anzograph/bin/azg_system_config /root/
  2. Run the following command to remove "sudo" from the azg_system_config script:
    # sed -i 's/sudo//g' /root/azg_system_config
  3. Create a file called azg_system_config.service in the /usr/lib/systemd/system directory. For example:
    # vi /usr/lib/systemd/system/azg_system_config.service
  4. Add the following contents to azg_system_config.service:
    [Unit]
    Description=Configure Linux for AnzoGraph
    [Service]
    Type=oneshot
    ExecStart=/root/azg_system_config
    [Install]
    WantedBy=multi-user.target
    
  5. Save and close the file.
  6. Run the following commands to start and enable the new service:
    # systemctl start azg_system_config.service
    # systemctl enable azg_system_config.service
  7. Repeat this process on all of the compute servers and the leader server.

Configure the AnzoGraph System Management Service

On each server in the cluster, follow the instructions below to set up a service that starts the AnzoGraph system management daemon (azgmgrd) as the Anzo service user if the host server is restarted.

  1. Create a file called azgmgrd.service in the /usr/lib/systemd/system directory. For example:
    # vi /usr/lib/systemd/system/azgmgrd.service
  2. Add the following contents to azgmgrd.service:
    [Unit]
    Description=AnzoGraph communication service
    # depends on NetworkManager-wait-online.service enabled
    Wants=network-online.target
    After=network-online.target
    [Service]
    Type=forking
    # The PID file is optional but recommended so that systemd
    # can identify the main process of the daemon
    # PIDFile=/var/run/azgmgrd.pid
    WorkingDirectory=install_path
    StandardOutput=syslog
    StandardError=syslog
    User=Anzo_service_user
    UMask=0022
    Environment=PATH=/sbin:/bin:/usr/sbin:/usr/bin:/install_path/bin:/install_path/tools/bin
    # Uncomment the following JAVA_HOME line for Anzo Unstructured and/or
    # Data Toolkit Service environments
    # ENVIRONMENT=JAVA_HOME=/usr/lib/jvm/jre-11
    ExecStart=/install_path/bin/azgmgrd /install_path/
    CPUAccounting=false
    MemoryAccounting=false
    [Install]
    WantedBy=multi-user.target

    For example:

    [Unit]
    Description=AnzoGraph communication service
    # depends on NetworkManager-wait-online.service enabled
    Wants=network-online.target
    After=network-online.target
    [Service]
    Type=forking
    # The PID file is optional but recommended so that systemd
    # can identify the main process of the daemon
    # PIDFile=/var/run/azgmgrd.pid
    WorkingDirectory=/opt/anzograph
    StandardOutput=syslog
    StandardError=syslog
    User=anzo
    UMask=0022
    Environment=PATH=/sbin:/bin:/usr/sbin:/usr/bin:/opt/anzograph/bin:/opt/anzograph/tools/bin
    # Uncomment the following JAVA_HOME line for Anzo Unstructured and/or
    # Data Toolkit Service environments
    ENVIRONMENT=JAVA_HOME=/usr/lib/jvm/jre-11
    ExecStart=/opt/anzograph/bin/azgmgrd /opt/anzograph/
    CPUAccounting=false
    MemoryAccounting=false
    [Install]
    WantedBy=multi-user.target
  3. Save and close the file.
  4. Run the following commands to start and enable the new service:
    # systemctl start azgmgrd.service
    # systemctl enable azgmgrd.service
  5. Repeat this process on all of the compute servers and the leader server.

Configure the AnzoGraph Database Service

On the leader server only, follow the instructions below to set up a service that will start AnzoGraph as the Anzo service user. This service is configured to run after the system management daemon is started.

  1. Create a file called anzograph.service in the /usr/lib/systemd/system directory. For example:
    # vi /usr/lib/systemd/system/anzograph.service
  2. Add the following contents to anzograph.service:
    [Unit]
    Description=AnzoGraph database service
    After=azgmgrd.service
    Wants=azgmgrd.service
    [Service]
    Type=forking
    # The PID file is optional but recommended so that systemd
    # can identify the main process of the daemon
    # PIDFile=/var/run/anzograph.pid
    WorkingDirectory=install_path
    StandardOutput=syslog
    StandardError=syslog
    RemainAfterExit=no
    Restart=on-failure
    RestartSec=60s
    User=Anzo_service_user
    UMask=0022
    Environment=PATH=/sbin:/bin:/usr/sbin:/usr/bin:/install_path/bin:/install_path/tools/bin
    ExecStart=/install_path/bin/azgctl -start
    [Install]
    WantedBy=multi-user.target

    For example:

    [Unit]
    Description=AnzoGraph database service
    After=azgmgrd.service
    Wants=azgmgrd.service
    [Service]
    Type=forking
    # The PID file is optional but recommended so that systemd
    # can identify the main process of the daemon
    # PIDFile=/var/run/anzograph.pid
    WorkingDirectory=/opt/anzograph
    StandardOutput=syslog
    StandardError=syslog
    RemainAfterExit=no
    Restart=on-failure
    RestartSec=60s
    User=anzo
    UMask=0022
    Environment=PATH=/sbin:/bin:/usr/sbin:/usr/bin:/opt/anzograph/bin:/opt/anzograph/tools/bin
    ExecStart=/opt/anzograph/bin/azgctl -start
    [Install]
    WantedBy=multi-user.target
  3. Save and close the file.
  4. Run the following commands to start and enable the new service:
    # systemctl start anzograph.service
    # systemctl enable anzograph.service

Once the services are in place and enabled, AnzoGraph should be running. Any time you start and stop the database, run the following systemctl commands on the leader node: sudo systemctl stop anzograph and sudo systemctl start anzograph. You do not need to stop and start azgmgrd.

For instructions on configuring the connection to AnzoGraph in the Anzo console, see Connecting to AnzoGraph.

Related Topics