Complete the 2.5 Post-Installation Configuration

Once AnzoGraph is installed, there are additional configuration tasks to complete to ensure that AnzoGraph is optimized to support all of the Anzo functionality and your workloads. Follow the instructions in the steps below to complete the post-installation configuration.

The first two procedures are optional. If you have custom database data sources that you plan to use with the Graph Data Interface (GDI), follow the instructions in Deploy Drivers for Custom Database Sources. If you installed the optional C++ extensions, follow the instructions in Install the Optional C++ Extension Dependencies.

Optimize the Linux Kernel Configuration and Configure and Start the AnzoGraph Services are required for all environments.

Deploy Drivers for Custom Database Sources

AnzoGraph uses the Graph Data Interface (GDI) Java plugin to connect directly to data sources. The GDI plugin is included in the AnzoGraph installation. Also included in the installation are JDBC drivers for the following databases:

  • Databricks
  • H2
  • IBM DB2
  • Microsoft SQL Server
  • MariaDB
  • Oracle
  • PostgreSQL
  • SAP Sybase (jTDS)
  • Snowflake

To extend the GDI to access custom database sources, JDBC drivers can be added to AnzoGraph. To add a driver, follow the steps below.

  1. Copy the .jar file to the <install_path>/lib/udx directory on the leader server.
  2. Restart the database by running the following command. When the database is restarted, the leader broadcasts any new .jar files to the compute servers.
    sudo systemctl restart anzograph

The <install_path>/lib/udx directory on the leader node is a user-managed directory rather than an AnzoGraph-managed directory like <install_path>/bin or <install_path>/internal. Users can place JDBC drivers and Java or C++ extensions in the lib/udx directory any time. Each time the database is restarted, AnzoGraph scans that directory, saves a copy of its contents to the <install_path>/internal/extensions directory, and then broadcasts the internal/extensions contents from the leader node to the compute nodes. Each restart clears internal/extensions and AnzoGraph rescans lib/udx to reload internal/extensions with the latest plugins.

Install the Optional C++ Extension Dependencies

Root user privileges are required to complete this task.

If you chose to install the optional C++ packages that extend AnzoGraph's built-in analytics, the following dependencies are required to be installed on all servers in the cluster. The installer provides a .repo file to aid you in configuring the yum repository and installing the required software packages:

libarchive13
libarmadillo10
libboost_filesystem1_71_0
libboost_iostreams1_71_0
libboost_system1_71_0
libgrpc++1
libflatbuffers1
libhdfs3
libnfs13
libserd-0-0
libsmb2
shadow-utils

This section includes instructions for using the included .repo file to install the C++ dependencies with or without internet access:

Installing the RPMs via the Internet

Follow the steps below if the AnzoGraph servers have external internet access.

  1. Copy the csi-obs-cambridgesemantics-udxcontrib.repo file from the <install_path>/examples/centos7/yum.repos.d directory to the /etc/yum.repos.d directory. For example, the following command copies the file from the default installation path to /etc/yum.repos.d:
    sudo cp /opt/cambridgesemantics/examples/centos7/yum.repos.d/csi-obs-cambridgesemantics-udxcontrib.repo /etc/yum.repos.d
  2. Next, run the following command to enable the repository and install the required packages:
    sudo yum install --enablerepo=csi-obs-cambridgesemantics-udxcontrib libarchive13 libarmadillo10 libboost_filesystem1_71_0 libboost_iostreams1_71_0 libboost_system1_71_0 libgrpc++1 libflatbuffers1 libhdfs3 libnfs13 libserd-0-0 libsmb2 shadow-utils
  3. Repeat these steps on all servers in the cluster.

Installing the RPMs via the Supplied TAR File

Follow the steps below if the AnzoGraph servers do not have external internet access.

  1. From a computer that does have internet access, download the dependency tarball, csi-obs-cambridgesemantics-udxcontrib.centos7.tar.xz, from the following Cambridge Semantics Google Cloud Storage location: https://storage.googleapis.com/csi-anzograph/udx/csi-os-contrib/centos7/2023-04/202304181246/csi-obs-cambridgesemantics-udxcontrib.centos7.tar.xz.

    You can run the following cURL command to download the tarball:

    curl -OL https://storage.googleapis.com/csi-anzograph/udx/csi-os-contrib/centos7/2023-04/202304181246/csi-obs-cambridgesemantics-udxcontrib.centos7.tar.xz(.sha512)
  2. Also from the computer that has internet access, download the repomd.xml.key from the following Cambridge Semantics Google Cloud Storage location: https://storage.googleapis.com/csi-rpmmd-pd/CambridgeSemantics:/UDXContrib/CentOS-7_SP5/repodata/repomd.xml.key.

    You can run the following cURL command to download the file:

    curl -OL https://storage.googleapis.com/csi-rpmmd-pd/CambridgeSemantics:/UDXContrib/CentOS-7_SP5/repodata/repomd.xml.key
  3. On each of the AnzoGraph servers, create a directory called /tmp/repo.
  4. Copy csi-obs-cambridgesemantics-udxcontrib.centos7.tar.xz to the /tmp/repo directory on each server.
  5. Then run the following command to unpack the tarball in the /tmp/repo directory:
    tar -xvf csi-obs*.tar.xz

    The files are unpacked into subdirectories under /tmp/repo/dl/centos7/csi-obs-cambridgesemantics-udxcontrib.

  6. Next, copy the repomd.xml.key file to the /tmp/repo/dl/centos7/csi-obs-cambridgesemantics-udxcontrib directory on each of the AnzoGraph servers.
  7. Now, open the csi-obs-cambridgesemantics-udxcontrib.repo file in the <install_path>/examples/yum.repos.d directory. The contents of the file are shown below:
    [csi-obs-cambridgesemantics-udxcontrib]
    name=Contrib directory for CambridgeSemantics AnzoGraph UDX dependencies
    baseurl=https://storage.googleapis.com/csi-rpmmd-pd/CambridgeSemantics:/UDXContrib/CentOS-7_SP5
    gpgkey=https://storage.googleapis.com/csi-rpmmd-pd/CambridgeSemantics:/UDXContrib/CentOS-7_SP5/repodata/repomd.xml.key
    gpgcheck=1
    enabled=1
  8. Edit the csi-obs-cambridgesemantics-udxcontrib.repo file contents to replace the baseurl and gpgkey values so that they point to the repo files that you unpacked in the /tmp/repo directory. In addition, change the gpgcheck and enabled values from 1 to 0. The contents of the updated file are shown below:
    [csi-obs-cambridgesemantics-udxcontrib]
    name=Contrib directory for CambridgeSemantics AnzoGraph UDX dependencies
    baseurl=file:///tmp/repo/dl/centos7/csi-obs-cambridgesemantics-udxcontrib
    gpgkey=file:///tmp/repo/dl/centos7/csi-obs-cambridgesemantics-udxcontrib/repomd.xml.key
    gpgcheck=0
    enabled=0
  9. Save and close the file.
  10. Copy csi-obs-cambridgesemantics-udxcontrib.repo from <install_path>/examples/centos7/yum.repos.d to the /etc/yum.repos.d directory. For example, the following command copies the file from the default installation path to /etc/yum.repos.d:
    sudo cp /opt/cambridgesemantics/examples/centos7/yum.repos.d/csi-obs-cambridgesemantics-udxcontrib.repo /etc/yum.repos.d
  11. Next, run the following command to enable the repository and install the required packages:
    sudo yum install --enablerepo=csi-obs-cambridgesemantics-udxcontrib libarchive13 libarmadillo10 libboost_filesystem1_71_0 libboost_iostreams1_71_0 libboost_system1_71_0 libgrpc++1 libflatbuffers1 libhdfs3 libnfs13 libserd-0-0 libsmb2 shadow-utils
  12. Repeat the steps above as needed to install the dependencies on all servers in the cluster.

Optimize the Linux Kernel Configuration

Root user privileges are required to complete this task.

To streamline the configuration of the operating system for peak AnzoGraph performance, the installer includes a tuned AnzoGraph profile that you can activate. Tuned is a daemon program that uses the udev device monitor to statically and dynamically tune operating system settings based on the specified profile.

For more information about Tuned, see Tuned in the Red Hat Performance Tuning Guide.

It is strongly recommended that you activate the AnzoGraph tuned profile to ensure that AnzoGraph is optimized to support your Anzo workloads. The profile, called azg, is in the <install_path>/examples/tuned-profile directory and consists of two files: tuned.conf and additional-tuneables.sh. For details about the files, see Tuned Profile Reference below.

Activating the Tuned Profile

To activate the azg profile, follow the steps below. Complete these steps on all servers in the cluster:

  1. Copy the azg directory from <install_path>/examples/tuned-profile to the /etc/tuned directory. For example, the following command copies azg from the default installation path to /etc/tuned:
    sudo cp -r /opt/cambridgesemantics/examples/azg /etc/tuned
  2. Next, run the following command to activate the azg profile:
    sudo tuned-adm profile azg

The host servers are now configured to use the tuned profile that is optimal for AnzoGraph.

To disable tuned profiles, you can run sudo tuned-adm off. After running the command, no tuned profiles will be active.

Tuned Profile Reference

This section describes the tuned AnzoGraph profile files and the kernel configuration changes that they apply.

tuned.conf

The table below describes the Linux kernel configuration settings that are modified by tuned.conf.

Setting Description AZG Profile Change
vm.dirty_ratio Specifies the percentage of system memory that can be occupied by "dirty" data before flushing the cache to disk. Dirty data are pages in memory that have been updated and do not match what is stored on disk. Reduces vm.dirty_ratio to 2% to increase the frequency with which the system cache is flushed.
vm.swappiness Controls the tendency of the kernel to move processes out of physical memory and onto the swap disk. A value of 0 means the kernel avoids swapping processes out of physical memory for as long as possible. A value of 100 tells the kernel to aggressively swap processes out of physical memory to the swap disk. Sets vm.swappiness to 30.
vm.max_map_count Sets the limit on the maximum number of memory map areas a process can use. Since AnzoGraph is memory intensive, it may reach the default maximum map count of 65535 and be shut down by the operating system. Increases vm.max_map_count to 2097152.
net.ipv4.tcp_rmem Controls the size of the receive buffer for TCP connections. It sets the minimum, default, and maximum sizes of the buffer in bytes. Sets tcp_rmem to "4096 87380 16777216".
net.ipv4.tcp_wmem Controls the size of the send buffer for TCP connections. It sets the minimum, default, and maximum sizes of the buffer in bytes. Sets tcp_wmem to "4096 16384 16777216".
net.ipv4.udp_mem Controls the amount of memory that can be allocated for the kernel's UDP buffer. It sets the minimum, default, and maximum sizes of the buffer in bytes. Sets udp_mem to "3145728 4194304 16777216".
transparent_hugepages Controls whether Transparent Huge Pages (THP) is enabled or disabled system-wide. When THP is enabled system-wide, it can dramatically degrade AnzoGraph performance. Disables THP by setting transparent_hugepages to never.

additional-tunables.sh

The additional-tuneables.sh script is called by tuned.conf and configures the following Linux kernel configuration settings so that they are optimal for AnzoGraph.

Setting Description AZG Profile Change
overcommit_memory Controls whether obvious overcommits of the address space are allowed. Sets overcommit_memory to 0 to ensure that very large overcommits are not allowed but some overcommits can be used to reduce swap usage.
overcommit_ratio Controls the percentage of memory that is allowed to be used for overcommits. Sets overcommit_ratio to 50%.
transparent_hugepage/defrag Though the tuned profile disables Transparent Huge Pages (THP) system-wide, this setting controls whether huge pages can still be enabled on a per process basis (inside MADV_HUGEPAGE madvise regions). Sets transparent_hugepage/defrag to madvise so that the kernel only assigns huge pages to individual process memory regions that are specified with the madvise() system call.
tcp_timestamps Controls whether TCP timestamps are enabled or disabled. Sets tcp_timestamps to 0, which disables TCP timestamps in order to reduce performance spikes related to timestamp generation.

Configure and Start the AnzoGraph Services

The last step in the post-installation configuration is to implement the AnzoGraph systemd services and start the database. It is important to set up the services to run as the platform service user so that AnzoGraph can access files on the shared file system. In addition, the services are configured to tune user resource limits (ulimits) as well as set $JAVA_HOME so that AnzoGraph can find the OpenJDK installation.

The service files are included in the <install_path>/examples/systemd-services directory. Follow the instructions below to configure and start the services.

  1. Configure the System Management Service
  2. Configure the Database Service on the Leader Server (and Single-Servers)

Configure the System Management Service

The system management daemon, azgmgrd, is a very lightweight program that runs on all AnzoGraph servers and manages communication between the system manager and the database as well as between the nodes in a cluster. Follow the steps below to configure and start the service that runs the azgmgrd process.

  1. Open the azgmgrd.service file in the <install_path>/examples/systemd-services directory. The contents of the file are shown below.

    The following contents are from an installation that used the default installation path, /opt/cambridgesematics. The contents of your file may differ. Also, note the User=anzograph value shown in bold below. The value needs to be edited to replace anzograph with the platform service user name.

    [Unit]
    Description=AnzoGraph communication service
    # depends on NetworkManager-wait-online.service enabled
    Wants=network-online.target
    After=network-online.target
    
    [Service]
    Type=forking
    # The PID file is optional, but recommended in the manpage
    # "so that systemd can identify the main process of the daemon"
    #PIDFile=/var/run/azgmgrd.pid
    WorkingDirectory=/opt/cambridgesemantics/anzograph
    StandardOutput=syslog
    StandardError=syslog
    LimitCPU=infinity
    LimitNOFILE=4096
    LimitAS=infinity
    LimitNPROC=infinity
    LimitMEMLOCK=infinity
    LimitLOCKS=infinity
    LimitFSIZE=infinity
    User=anzograph
    UMask=007
    Environment=PATH=$PATH:/opt/cambridgesemantics/anzograph/bin:/opt/cambridgesemantics/anzograph/tools/bin
    Environment=JAVA_HOME=/usr/lib/jvm/jre-11-openjdk
    Environment=UDX_LOGS=/opt/cambridgesemantics/anzograph/internal/logs
    Environment=HYPER_PATH=/opt/cambridgesemantics/anzograph/vendor/com.tableau/hyper/lib/hyper
    ExecStart=/opt/cambridgesemantics/anzograph/bin/azgmgrd /opt/cambridgesemantics/anzograph
    
    CPUAccounting=false
    MemoryAccounting=false
    [Install]
    WantedBy=multi-user.target
    Alias=sbxmgrd.service
  2. In the following line of the file, replace anzograph with the name of the platform service user.
    User=anzograph

    For example:

    User=anzo
  3. Save and close the file.
  4. Copy azgmgrd.service from the <install_path>/examples/systemd-services directory to the /usr/lib/systemd/system directory. For example, the following command copies azgmgrd.service from the default installation path to /usr/lib/systemd/system:
    sudo cp /opt/cambridgesemantics/examples/systemd-services/azgmgrd.service /usr/lib/systemd/system
  5. Run the following commands to start and enable the service:
    sudo systemctl start azgmgrd.service
    sudo systemctl enable azgmgrd.service
  6. Repeat this process on all servers in the cluster.

The azgmgrd deamon must be running to start the database, but it typically does not need to be restarted unless you are upgrading AnzoGraph or the host servers are rebooted. It does not need to be stopped and started each time the database is restarted.

Configure the Database Service on the Leader Server (and Single-Servers)

The anzograph service runs the database process. This service is configured to run after azgmgrd is started. Starting the database is done only on the leader server. The leader connects to the system managers on the compute servers and starts the database across the cluster.

  1. Open the anzograph.service file in the <install_path>/examples/systemd-services directory. The contents of the file are shown below.

    The following contents are from an installation that used the default installation path, /opt/cambridgesematics. The contents of your file may differ. Also, note the User=anzograph value shown in bold below. The value needs to be edited to replace anzograph with the platform service user name.

    [Unit]
    Description=AnzoGraph database service
    After=azgmgrd.service
    Wants=azgmgrd.service
    
    [Service]
    Type=oneshot
    # The PID file is optional, but recommended in the manpage
    # "so that systemd can identify the main process of the daemon"
    #PIDFile=/var/run/azg.pid
    WorkingDirectory=/opt/cambridgesemantics/anzograph
    StandardOutput=syslog
    StandardError=syslog
    User=anzograph
    UMask=027
    RemainAfterExit=yes
    Environment=PATH=$PATH:/opt/cambridgesemantics/anzograph/bin:/opt/cambridgesemantics/anzograph/tools/bin
    ExecStart=/opt/cambridgesemantics/anzograph/bin/azgctl -start
    ExecStop=/opt/cambridgesemantics/anzograph/bin/azgctl -stop
    
    [Install]
    WantedBy=multi-user.target
    Alias=gqe.service
  2. In the following line of the file, replace anzograph with the name of the platform service user.
    User=anzograph

    For example:

    User=anzo
  3. Save and close the file.
  4. Copy anzograph.service from the <install_path>/examples/systemd-services directory to the /usr/lib/systemd/system directory. For example, the following command copies anzograph.service from the default installation path to /usr/lib/systemd/system:
    sudo cp /opt/cambridgesemantics/examples/systemd-services/anzograph.service /usr/lib/systemd/system
  5. Run the following commands to start and enable the new service:
    sudo systemctl start anzograph.service
    sudo systemctl enable anzograph.service

Once the services are in place and enabled, AnzoGraph should be running. To stop and start the database from the command line, run the following systemctl commands on the leader node (You do not need to stop and start azgmgrd.):

sudo systemctl stop anzograph
sudo systemctl start anzograph 

For instructions on configuring the connection to AnzoGraph in Anzo, see Connecting to AnzoGraph.

See Securing an AnzoGraph 2.5 Environment for recommendations to follow for securing AnzoGraph environments.