Complete the Post-Installation Configuration

Once AnzoGraph is installed, there are additional configuration tasks to complete to ensure that AnzoGraph is optimized to support all of the Anzo functionality and your workloads. Follow the instructions in the steps below to complete the post-installation configuration.

The first two steps are optional. If you have custom database data sources that you plan to use with the AnzoGraph Graph Data Interface (GDI), follow the instructions in Step 1. If you installed the optional C++ extensions, follow the instructions in Step 2. Steps 3 and 4 are required for all AnzoGraph environments.

  1. Deploy Optional Drivers for Accessing Custom Database Sources
  2. Install the Optional C++ Extension Dependencies
  3. Optimize the Linux Kernel Configuration for AnzoGraph
  4. Configure and Start the AnzoGraph Services

Deploy Optional Drivers for Accessing Custom Database Sources

AnzoGraph uses the Graph Data Interface (GDI) Java plugin to connect directly to data sources when you profile a source, ingest data sources via the direct data load workflow, or blend data into a graphmart via manually created queries. The GDI plugin is included in the AnzoGraph installation. Also included in the installation are JDBC drivers for the following databases:

  • Databricks
  • H2
  • IBM DB2
  • Microsoft SQL Server
  • MariaDB
  • Oracle
  • PostgreSQL
  • SAP Sybase (jTDS)
  • Snowflake

To extend the GDI to access custom databases, custom JDBC drivers can also be deployed to AnzoGraph. To add a JDBC driver, copy it to the <install_path>/lib/udx directory on the leader server. Once the database is started, the leader broadcasts any new .jar files to the compute servers.

The <install_path>/lib/udx directory on the leader node is a user-managed directory rather than an AnzoGraph-managed directory like <install_path>/bin or <install_path>/internal. Users can place JDBC drivers and Java or C++ extensions in the lib/udx directory any time. Each time the database is started, AnzoGraph scans that directory, saves a copy of its contents to the <install_path>/internal/extensions directory, and then broadcasts the internal/extensions contents from the leader node to the compute nodes. Each restart clears internal/extensions and AnzoGraph rescans lib/udx to reload internal/extensions with the latest plugins.

Install the Optional C++ Extension Dependencies

Root user privileges are required to complete this task.

If you chose to install the optional C++ packages that extend AnzoGraph's built-in analytics, additional dependencies are required to be installed on all servers in the cluster. The installer provides a .repo file to aid you in configuring the yum repository and installing the following required software packages:

  • libarchive13
  • libarmadillo10
  • libboost_filesystem1_71_0
  • libboost_iostreams1_71_0
  • libboost_system1_71_0
  • libgrpc++1
  • libflatbuffers1
  • libhdfs3
  • libnfs13
  • libserd-0-0
  • libsmb2
  • shadow-utils

This section includes instructions for using the included .repo file to install the C++ dependencies with or without internet access:

Installing the RPMs via the Internet

Follow the steps below if the AnzoGraph servers have external internet access.

  1. Copy the csi-obs-cambridgesemantics-udxcontrib.repo file from the <install_path>/examples/yum.repos.d directory to the /etc/yum.repos.d directory. For example, the following command copies the file from the default installation path to /etc/yum.repos.d:
    sudo cp /opt/cambridgesemantics/examples/yum.repos.d/csi-obs-cambridgesemantics-udxcontrib.repo /etc/yum.repos.d
  2. Next, run the following command to enable the repository and install the required packages:
    sudo yum install --enablerepo=csi-obs-cambridgesemantics-udxcontrib libarchive13 libarmadillo10 libboost_filesystem1_71_0 libboost_iostreams1_71_0 libboost_system1_71_0 libgrpc++1 libflatbuffers1 libhdfs3 libnfs13 libserd-0-0 libsmb2 shadow-utils
  3. Repeat these steps on all servers in the cluster.

Installing the RPMs via the Supplied TAR File

Follow the steps below if the AnzoGraph servers do not have external internet access.

  1. From a computer that does have internet access, download the dependency tarball, csi-obs-cambridgesemantics-udxcontrib.centos7.tar.xz, from the following Cambridge Semantics Google Cloud Storage location: https://storage.googleapis.com/csi-anzograph/udx/csi-os-contrib/centos7/2023-04/202304181246/csi-obs-cambridgesemantics-udxcontrib.centos7.tar.xz.

    You can run the following cURL command to download the tarball:

    curl -OL https://storage.googleapis.com/csi-anzograph/udx/csi-os-contrib/centos7/2023-04/202304181246/csi-obs-cambridgesemantics-udxcontrib.centos7.tar.xz(.sha512)
  2. Also from the computer that has internet access, download the repomd.xml.key from the following Cambridge Semantics Google Cloud Storage location: https://storage.googleapis.com/csi-rpmmd-pd/CambridgeSemantics:/UDXContrib/CentOS-7_SP5/repodata/repomd.xml.key.

    You can run the following cURL command to download the file:

    curl -OL https://storage.googleapis.com/csi-rpmmd-pd/CambridgeSemantics:/UDXContrib/CentOS-7_SP5/repodata/repomd.xml.key
  3. On each of the AnzoGraph servers, create a directory called /tmp/repo.
  4. Copy csi-obs-cambridgesemantics-udxcontrib.centos7.tar.xz to the /tmp/repo directory on each server.
  5. Then run the following command to unpack the tarball in the /tmp/repo directory:
    tar -xvf csi-obs*.tar.xz

    The files are unpacked into subdirectories under /tmp/repo/dl/centos7/csi-obs-cambridgesemantics-udxcontrib.

  6. Next, copy the repomd.xml.key file to the /tmp/repo/dl/centos7/csi-obs-cambridgesemantics-udxcontrib directory on each of the AnzoGraph servers.
  7. Now, open the csi-obs-cambridgesemantics-udxcontrib.repo file in the <install_path>/examples/yum.repos.d directory. The contents of the file are shown below:
    [csi-obs-cambridgesemantics-udxcontrib]
    name=Contrib directory for CambridgeSemantics AnzoGraph UDX dependencies
    baseurl=https://storage.googleapis.com/csi-rpmmd-pd/CambridgeSemantics:/UDXContrib/CentOS-7_SP5
    gpgkey=https://storage.googleapis.com/csi-rpmmd-pd/CambridgeSemantics:/UDXContrib/CentOS-7_SP5/repodata/repomd.xml.key
    gpgcheck=1
    enabled=1
  8. Edit the csi-obs-cambridgesemantics-udxcontrib.repo file contents to replace the baseurl and gpgkey values so that they point to the repo files that you unpacked in the /tmp/repo directory. In addition, change the gpgcheck and enabled values from 1 to 0. The contents of the updated file are shown below:
    [csi-obs-cambridgesemantics-udxcontrib]
    name=Contrib directory for CambridgeSemantics AnzoGraph UDX dependencies
    baseurl=file:///tmp/repo/dl/centos7/csi-obs-cambridgesemantics-udxcontrib
    gpgkey=file:///tmp/repo/dl/centos7/csi-obs-cambridgesemantics-udxcontrib/repomd.xml.key
    gpgcheck=0
    enabled=0
  9. Save and close the file.
  10. Copy csi-obs-cambridgesemantics-udxcontrib.repo from <install_path>/examples/yum.repos.d to the /etc/yum.repos.d directory. For example, the following command copies the file from the default installation path to /etc/yum.repos.d:
    sudo cp /opt/cambridgesemantics/examples/yum.repos.d/csi-obs-cambridgesemantics-udxcontrib.repo /etc/yum.repos.d
  11. Next, run the following command to enable the repository and install the required packages:
    sudo yum install --enablerepo=csi-obs-cambridgesemantics-udxcontrib libarchive13 libarmadillo10 libboost_filesystem1_71_0 libboost_iostreams1_71_0 libboost_system1_71_0 libgrpc++1 libflatbuffers1 libhdfs3 libnfs13 libserd-0-0 libsmb2 shadow-utils
  12. Repeat the steps above as needed to install the dependencies on all servers in the cluster.

Optimize the Linux Kernel Configuration for AnzoGraph

Root user privileges are required to complete this task.

To streamline the configuration of the operating system for peak AnzoGraph performance, the installer includes a tuned AnzoGraph profile that you can activate. Tuned is a daemon program that uses the udev device monitor to statically and dynamically tune operating system settings based on the specified profile.

For more information about Tuned, see Tuned in the Red Hat Performance Tuning Guide.

It is strongly recommended that you activate the AnzoGraph tuned profile to ensure that AnzoGraph is optimized to support your Anzo workloads. The profile, called azg, is in the <install_path>/examples/tuned-profile directory and consists of two files: tuned.conf and additional-tuneables.sh. For details about the files, see Tuned AnzoGraph Profile Reference below.

Activating the Tuned Profile

To activate the azg profile, follow the steps below. Complete these steps on all servers in the cluster:

  1. Copy the azg directory from <install_path>/examples/tuned-profile to the /etc/tuned directory. For example, the following command copies azg from the default installation path to /etc/tuned:
    sudo cp -r /opt/cambridgesemantics/examples/azg /etc/tuned
  2. Next, run the following command to activate the azg profile:
    sudo tuned-adm profile azg

The host servers are now configured to use the tuned profile that is optimal for AnzoGraph.

To disable tuned profiles, you can run the following command:

sudo tuned-adm off

After running the command, no tuned profiles will be active.

Tuned AnzoGraph Profile Reference

This section describes the tuned AnzoGraph profile files and the kernel configuration changes that they apply.

tuned.conf

The tuned.conf file optimizes network throughput performance by increasing the number of kernel network buffers and tuning the values for the following Linux kernel configuration settings:

  • vm.dirty_ratio: This setting specifies the percentage of system memory that can be occupied by "dirty" data before flushing the cache to disk. Dirty data are pages in memory that have been updated and do not match what is stored on disk. The AnzoGraph tuned profile reduces vm.dirty_ratio to 2% to increase the frequency with which the system cache is flushed.
  • vm.swappiness: This setting controls the tendency of the kernel to move processes out of physical memory and onto the swap disk. A value of 0 means the kernel avoids swapping processes out of physical memory for as long as possible. A value of 100 tells the kernel to aggressively swap processes out of physical memory to the swap disk. The AnzoGraph tuned profile sets vm.swappiness to 30.
  • vm.max_map_count: This setting sets the limit on the maximum number of memory map areas a process can use. Since AnzoGraph is memory intensive, it may reach the default maximum map count of 65535 and be shut down by the operating system. The tuned profile increases vm.max_map_count to 2097152.
  • transparent_hugepages: This setting controls whether Transparent Huge Pages (THP) is enabled or disabled system-wide. When THP is enabled system-wide, it can dramatically degrade AnzoGraph performance. So the AnzoGraph tuned profile disables THP by setting transparent_hugepages to never.

additional-tunables.sh

The additional-tuneables.sh script is called by tuned.conf and configures the following settings so that they are optimal for AnzoGraph:

  • overcommit_memory: This setting controls whether obvious overcommits of the address space are allowed. The profile sets overcommit_memory to 0 (the default value for the kernel), which ensures that very large overcommits are not allowed but some overcommits can be used to reduce swap usage.
  • overcommit_ratio: This setting controls the percentage of memory that is allowed to be used for overcommits. The tuned profile sets overcommit_ratio to 50% (the default value for the kernel).
  • transparent_hugepage/defrag: Though the AnzoGraph tuned profile disables Transparent Huge Pages (THP) system-wide, this setting controls whether huge pages can still be enabled on a per process basis (inside MADV_HUGEPAGE madvise regions). The profile sets transparent_hugepage/defrag to madvise so that the kernel only assigns huge pages to individual process memory regions that are specified with the madvise() system call.
  • tcp_timestamps: This setting controls whether TCP timestamps are enabled or disabled. The profile sets tcp_timestamps to 0, which disables TCP timestamps in order to reduce performance spikes related to timestamp generation.

Configure and Start the AnzoGraph Services

Root user privileges are required to complete this task.

The last step in the post-installation configuration is to implement the AnzoGraph systemd services and start the database. It is important to set up AnzoGraph services to run as the Anzo service user so that AnzoGraph can access the data that other platform components write to the shared file system. In addition, the services are configured to tune user resource limits (ulimits) for the AnzoGraph process as well as set JAVA_HOME so that AnzoGraph can find the OpenJDK installation.

The service files are included in the <install_path>/examples/systemd-services directory. Follow the instructions below to configure and start the AnzoGraph services.

  1. Configure the AnzoGraph System Management Service
  2. Configure the AnzoGraph Database Service on the Leader Server (and Single-Server Installations)

Configure the AnzoGraph System Management Service

The AnzoGraph system management daemon, azgmgrd, is a very lightweight program that runs on all AnzoGraph servers and manages AnzoGraph communication between the system manager and the database as well as between the nodes in a cluster. Follow the steps below to configure and start the service that runs the azgmgrd process.

  1. Open the azgmgrd.service file in the <install_path>/examples/systemd-services directory. The contents of the file are shown below.

    The following contents are from an installation that used the default installation path, /opt/cambridgesematics. The contents of your file may differ. Also, note the User=anzograph value shown in bold below. The value needs to be edited to replace anzograph with the Anzo service user name.

    [Unit]
    Description=AnzoGraph communication service
    # depends on NetworkManager-wait-online.service enabled
    Wants=network-online.target
    After=network-online.target
    
    [Service]
    Type=forking
    # The PID file is optional, but recommended in the manpage
    # "so that systemd can identify the main process of the daemon"
    #PIDFile=/var/run/azgmgrd.pid
    WorkingDirectory=/opt/cambridgesemantics/anzograph
    StandardOutput=syslog
    StandardError=syslog
    LimitCPU=infinity
    LimitNOFILE=4096
    LimitAS=infinity
    LimitNPROC=infinity
    LimitMEMLOCK=infinity
    LimitLOCKS=infinity
    LimitFSIZE=infinity
    User=anzograph
    UMask=007
    Environment=PATH=$PATH:/opt/cambridgesemantics/anzograph/bin:/opt/cambridgesemantics/anzograph/tools/bin
    Environment=JAVA_HOME=/usr/lib/jvm/jre-11-openjdk
    Environment=UDX_LOGS=/opt/cambridgesemantics/anzograph/internal/logs
    Environment=HYPER_PATH=/opt/cambridgesemantics/anzograph/vendor/com.tableau/hyper/lib/hyper
    ExecStart=/opt/cambridgesemantics/anzograph/bin/azgmgrd /opt/cambridgesemantics/anzograph
    
    CPUAccounting=false
    MemoryAccounting=false
    [Install]
    WantedBy=multi-user.target
    Alias=sbxmgrd.service
  2. In the following line of the file, replace anzograph with the name of the Anzo service user.
    User=anzograph

    For example, if the name of the service user is anzo, the line is changed to the following value:

    User=anzo
  3. Save and close the file.
  4. Copy azgmgrd.service from the <install_path>/examples/systemd-services directory to the /usr/lib/systemd/system directory. For example, the following command copies azgmgrd.service from the default installation path to /usr/lib/systemd/system:
    sudo cp /opt/cambridgesemantics/examples/systemd-services/azgmgrd.service /usr/lib/systemd/system
  5. Run the following commands to start and enable the service:
    sudo systemctl start azgmgrd.service
    sudo systemctl enable azgmgrd.service
  6. Repeat this process on all servers in the cluster.

The azgmgrd deamon must be running to start the database, but it typically does not need to be restarted unless you are upgrading AnzoGraph or the host servers are rebooted. It does not need to be stopped and started each time the database is restarted.

Configure the AnzoGraph Database Service on the Leader Server (and Single-Server Installations)

The AnzoGraph service runs the database process. This service is configured to run after the system management daemon (azgmgrd) is started. Starting the database is done only on the leader server. The leader connects to the system managers on the compute servers and starts the database across the cluster.

  1. Open the anzograph.service file in the <install_path>/examples/systemd-services directory. The contents of the file are shown below.

    The following contents are from an installation that used the default installation path, /opt/cambridgesematics. The contents of your file may differ. Also, note the User=anzograph value shown in bold below. The value needs to be edited to replace anzograph with the Anzo service user name.

    [Unit]
    Description=AnzoGraph database service
    After=azgmgrd.service
    Wants=azgmgrd.service
    
    [Service]
    Type=oneshot
    # The PID file is optional, but recommended in the manpage
    # "so that systemd can identify the main process of the daemon"
    #PIDFile=/var/run/azg.pid
    WorkingDirectory=/opt/cambridgesemantics/anzograph
    StandardOutput=syslog
    StandardError=syslog
    User=anzograph
    UMask=027
    RemainAfterExit=yes
    Environment=PATH=$PATH:/opt/cambridgesemantics/anzograph/bin:/opt/cambridgesemantics/anzograph/tools/bin
    ExecStart=/opt/cambridgesemantics/anzograph/bin/azgctl -start
    ExecStop=/opt/cambridgesemantics/anzograph/bin/azgctl -stop
    
    [Install]
    WantedBy=multi-user.target
    Alias=gqe.service
  2. In the following line of the file, replace anzograph with the name of the Anzo service user.
    User=anzograph

    For example, if the name of the service user is anzo, the line is changed to the following value:

    User=anzo
  3. Save and close the file.
  4. Copy anzograph.service from the <install_path>/examples/systemd-services directory to the /usr/lib/systemd/system directory. For example, the following command copies anzograph.service from the default installation path to /usr/lib/systemd/system:
    sudo cp /opt/cambridgesemantics/examples/systemd-services/anzograph.service /usr/lib/systemd/system
  5. Run the following commands to start and enable the new service:
    sudo systemctl start anzograph.service
    sudo systemctl enable anzograph.service

Once the services are in place and enabled, AnzoGraph should be running. Any time you start and stop the database, run the following systemctl commands on the leader node:

sudo systemctl stop anzograph
sudo systemctl start anzograph

You do not need to stop and start azgmgrd.

For instructions on configuring the connection to AnzoGraph in the Anzo application, see Connecting to AnzoGraph.

See Securing an AnzoGraph Environment for recommendations to follow for securing AnzoGraph environments.

Related Topics