Complete the Post-Installation Configuration

Once AnzoGraph is installed, there are additional, critical tasks to complete to ensure that AnzoGraph is configured to support all of the Anzo functionality. In addition is it important to set up AnzoGraph services to run as the Anzo service user so that AnzoGraph can access the data that other platform components write to the shared file system. Follow the instructions in the steps below to complete the post-installation configuration.

  1. Deploy the Graph Data Interface Java Plugin
  2. Deploy Optional Drivers for Accessing Database Sources
  3. Configure and Start the AnzoGraph Services

Deploy the Graph Data Interface Java Plugin

The Graph Data Interface (GDI) Java Plugin is a .jar file that is provided by Cambridge Semantics Customer Success. A separate, optional Logging plugin is also provided to enable reporting for GDI usage. Follow the instructions below to deploy the plugins and configure logging.

Java Development Kit version 11 is required for using the GDI. If OpenJDK 11 is not installed, see Install OpenJDK 11 for instructions.

  1. Download the following .jar files provided by Cambridge Semantics. Place the downloaded files on the AnzoGraph leader server:
    • gdi-<version>.jar
    • logging-<version>.jar
  2. Copy the two files to the <install_path>/lib/udx directory on the leader server.
  3. Next, run the following command to change the owner of the files to the anzograph user:
    chown anzograph:anzograph -R install_path/lib/udx

    For example:

    chown anzograph:anzograph -R /opt/anzograph/lib/udx
  4. If you want to enable logging for the GDI, create a file called log.config in the <install_path>/lib/udx directory. Then add the following contents to log.config:
    @level=WARN
    @file=/location_to_create_log_file/udx.log
    @file.color=false
    @udx=true
    @stderr=false
    @stderr.color=false
    @stdout=false
    @stdout.color=false
    com.cambridgesemantics.anzo.*=INFO
    com.cambridgesemantics.anzo.datatoolkit.*=TRACE
    com.cambridgesemantics.anzograph.*=INFO
    com.cambridgesemantics.anzograph.datatoolkit.*=TRACE
    org.openanzo.*=ERROR

    For example:

    @level=WARN
    @file=/opt/anzograph/internal/udx.log
    @file.color=false
    @udx=true
    @stderr=false
    @stderr.color=false
    @stdout=false
    @stdout.color=false
    com.cambridgesemantics.anzo.*=INFO
    com.cambridgesemantics.anzo.datatoolkit.*=TRACE
    com.cambridgesemantics.anzograph.*=INFO
    com.cambridgesemantics.anzograph.datatoolkit.*=TRACE
    org.openanzo.*=ERROR

Once the AnzoGraph services are configured and the database is started (as described in Configure and Start the AnzoGraph Services below), the new plugins are enabled. On a cluster, the leader broadcasts the .jar file to the compute servers.

The GDI natively supports reading or ingesting data from CSV and TSV, JSON, XML, Parquet, and SAS (SAS Transport XPT and SAS7BDAT) files as well as HTTP/REST endpoints. You can extend the service to access relational databases by adding JDBC drivers to the <install_path>/lib/udx directory. See Deploy Optional Drivers for Accessing Database Sources below for more information.

The <install_path>/lib/udx directory on the leader node is a user-managed directory rather than an AnzoGraph-managed directory like <install_path>/bin or <install_path>/internal. Users can place JDBC drivers and Java or C++ extensions in the lib/udx directory any time. Each time the database is started, AnzoGraph scans that directory, saves a copy of its contents to the <install_path>/internal/extensions directory, and then broadcasts the internal/extensions contents from the leader node to the compute nodes. Each restart clears internal/extensions and AnzoGraph rescans lib/udx to reload internal/extensions with the latest plugins.

Deploy Optional Drivers for Accessing Database Sources

To extend the Graph Data Interface (GDI) service to access relational databases, JDBC drivers can also be deployed to AnzoGraph. If AnzoGraph will access relational Data Sources for Data Source Profiling or GDI queries, copy the same drivers that you use for Anzo to the <install_path>/lib/udx directory on the AnzoGraph leader server. The leader also broadcasts any driver .jar files to the compute servers when the database is started.

Configure and Start the AnzoGraph Services

Once the Graph Data Interface client and any other optional drivers are deployed, the last step is to configure the AnzoGraph services and start the database. There are three processes involved in the initial startup of AnzoGraph. And subsequent starts involve one or more of these steps depending on the state of AnzoGraph and the servers:

  1. The first process involves the configuration of the Linux kernel and it applies to all servers in the cluster. The default kernel configuration for the following settings is not optimal for AnzoGraph:
    • transparent_hugepage: Transparent Huge Pages (THP) are enabled by default and can degrade AnzoGraph performance. THP should be disabled for AnzoGraph.
    • max_map_count: By default, the maximum number of memory map areas that a process can use is 65535. Since AnzoGraph is memory intensive, it may reach the maximum map count and be shut down by the operating system. AnzoGraph requires a max_map_count value of 2097152.

    At startup, AnzoGraph checks these settings and returns a warning if the values are not suitable. You are required to make the kernel changes or configure AnzoGraph to start with non-optimal configurations. The AnzoGraph deployment includes a script (<install_path>/bin/azg_system_config) that makes the required kernel configuration changes. Superuser privileges are required to make the changes, however, and each time the host server is rebooted the script must be run again because the kernel configuration reverts to the defaults.

  2. The second process involves the AnzoGraph system management daemon, azgmgrd. This very lightweight program runs on all servers in the cluster and manages AnzoGraph communication between the nodes. It must be running to start the database, but it typically does not need to be restarted unless you are upgrading AnzoGraph or the host servers are rebooted. It does not need to be stopped and started each time the database is restarted.
  3. The third process involves starting the database with the system manager. Starting the database is done only on the leader server. The leader connects to the system managers on the compute servers and starts the database across the cluster.

To ensure that the right account/permissions are used to perform the three steps above (i.e., the root user makes the kernel changes and the Anzo service account starts the system management daemon and the database) whenever the host server is rebooted, Cambridge Semantics recommends that you configure services to run the AnzoGraph startup steps. This section provides instructions for configuring the three services.

Root user privileges are required to complete the tasks below.
  1. Configure the Linux Kernel Configuration Service
  2. Configure the AnzoGraph System Management Service
  3. Configure the AnzoGraph Database Service

On clusters, configure the first two services, the Linux kernel configuration service and the AnzoGraph system management service, on all servers in the cluster. Configure the AnzoGraph database service only on the leader node. For single-server deployments, configure all three services on the server.

Configure the Linux Kernel Configuration Service

On each server in the cluster, follow the instructions below to set up a service to apply the Linux kernel configuration changes any time the AnzoGraph host server is restarted.

If making the kernel changes is not possible, you can set the os_allow_alternate_vm_config value to true in the AnzoGraph settings file. This setting enables AnzoGraph to start with non-optimal Linux configurations. See Changing AnzoGraph Configuration Settings for instructions.

  1. Run the following command to copy the AnzoGraph system configuration script, azg_system_config, to the root directory:
    # cp <install_path>/bin/azg_system_config /root/

    For example:

    # cp /opt/anzograph/bin/azg_system_config /root/
  2. Run the following command to remove "sudo" from the azg_system_config script:
    # sed -i 's/sudo//g' /root/azg_system_config
  3. Create a file called azg_system_config.service in the /usr/lib/systemd/system directory. For example:
    # vi /usr/lib/systemd/system/azg_system_config.service
  4. Add the following contents to azg_system_config.service:
    [Unit]
    Description=Configure Linux for AnzoGraph
    
    [Service]
    Type=oneshot
    ExecStart=/root/azg_system_config
    
    [Install]
    WantedBy=multi-user.target
    
  5. Save and close the file.
  6. Run the following commands to start and enable the new service:
    # systemctl start azg_system_config.service
    # systemctl enable azg_system_config.service
  7. Repeat this process on all of the compute servers and the leader server.

Configure the AnzoGraph System Management Service

On each server in the cluster, follow the instructions below to set up a service that starts the AnzoGraph system management daemon (azgmgrd) as the Anzo service user if the host server is restarted.

  1. Create a file called azgmgrd.service in the /usr/lib/systemd/system directory. For example:
    # vi /usr/lib/systemd/system/azgmgrd.service
  2. Add the following contents to azgmgrd.service. The placeholder values are shown in bold:
    [Unit]
    Description=AnzoGraph communication service
    # depends on NetworkManager-wait-online.service enabled
    Wants=network-online.target
    After=network-online.target
    
    [Service]
    Type=forking
    RemainAfterExit=yes
    Restart=on-failure
    RestartSec=60s
    # The PID file is optional but recommended so that systemd
    # can identify the main process of the daemon
    # PIDFile=/var/run/azgmgrd.pid
    WorkingDirectory=install_path
    StandardOutput=syslog
    StandardError=syslog
    LimitCPU=infinity
    LimitNOFILE=4096
    LimitAS=infinity
    LimitNPROC=infinity
    LimitMEMLOCK=infinity
    LimitLOCKS=infinity
    LimitFSIZE=infinity
    User=Anzo_service_user
    UMask=0022
    Environment=PATH=/sbin:/bin:/usr/sbin:/usr/bin:/install_path/bin:/install_path/tools/bin
    Environment=JAVA_HOME=/usr/lib/jvm/jre-11
    ExecStart=/install_path/bin/azgmgrd /install_path/
    CPUAccounting=false
    MemoryAccounting=false
    
    [Install]
    WantedBy=multi-user.target

    Where install_path is the AnzoGraph installation path and directory and Anzo_service_user is the name of the Anzo service user. For example:

    [Unit]
    Description=AnzoGraph communication service
    # depends on NetworkManager-wait-online.service enabled
    Wants=network-online.target
    After=network-online.target
    
    [Service]
    Type=forking
    RemainAfterExit=yes
    Restart=on-failure
    RestartSec=60s
    # The PID file is optional but recommended so that systemd
    # can identify the main process of the daemon
    # PIDFile=/var/run/azgmgrd.pid
    WorkingDirectory=/opt/anzograph
    StandardOutput=syslog
    StandardError=syslog
    LimitCPU=infinity
    LimitNOFILE=4096
    LimitAS=infinity
    LimitNPROC=infinity
    LimitMEMLOCK=infinity
    LimitLOCKS=infinity
    LimitFSIZE=infinity
    User=anzo
    UMask=0022
    Environment=PATH=/sbin:/bin:/usr/sbin:/usr/bin:/opt/anzograph/bin:/opt/anzograph/tools/bin
    Environment=JAVA_HOME=/usr/lib/jvm/jre-11
    ExecStart=/opt/anzograph/bin/azgmgrd /opt/anzograph/
    CPUAccounting=false
    MemoryAccounting=false
    
    [Install]
    WantedBy=multi-user.target
  3. Save and close the file.
  4. Run the following commands to start and enable the new service:
    # systemctl start azgmgrd.service
    # systemctl enable azgmgrd.service
  5. Repeat this process on all of the compute servers and the leader server.

Configure the AnzoGraph Database Service

On the leader server only, follow the instructions below to set up a service that will start AnzoGraph as the Anzo service user. This service is configured to run after the system management daemon is started.

  1. Create a file called anzograph.service in the /usr/lib/systemd/system directory. For example:
    # vi /usr/lib/systemd/system/anzograph.service
  2. Add the following contents to anzograph.service. The placeholder values are shown in bold:
    [Unit]
    Description=AnzoGraph database service
    After=azgmgrd.service
    Wants=azgmgrd.service
    
    [Service]
    Type=oneshot
    RemainAfterExit=yes
    RestartSec=60s
    # The PID file is optional but recommended so that systemd
    # can identify the main process of the daemon
    # PIDFile=/var/run/anzograph.pid
    WorkingDirectory=install_path
    StandardOutput=syslog
    StandardError=syslog
    User=Anzo_service_user
    UMask=0022
    Environment=PATH=/sbin:/bin:/usr/sbin:/usr/bin:/install_path/bin:/install_path/tools/bin
    ExecStart=/install_path/bin/azgctl -start
    ExecStop=/install_path/bin/azgctl -stop
    
    [Install]
    WantedBy=multi-user.target

    For example:

    [Unit]
    Description=AnzoGraph database service
    After=azgmgrd.service
    Wants=azgmgrd.service
    
    [Service]
    Type=oneshot
    RemainAfterExit=yes
    RestartSec=60s
    # The PID file is optional but recommended so that systemd
    # can identify the main process of the daemon
    # PIDFile=/var/run/anzograph.pid
    WorkingDirectory=/opt/anzograph
    StandardOutput=syslog
    StandardError=syslog
    User=anzo
    UMask=0022
    Environment=PATH=/sbin:/bin:/usr/sbin:/usr/bin:/opt/anzograph/bin:/opt/anzograph/tools/bin
    ExecStart=/opt/anzograph/bin/azgctl -start
    ExecStop=/opt/anzograph/bin/azgctl -stop
    
    [Install]
    WantedBy=multi-user.target
  3. Save and close the file.
  4. Run the following commands to start and enable the new service:
    # systemctl start anzograph.service
    # systemctl enable anzograph.service

Once the services are in place and enabled, AnzoGraph should be running. Any time you start and stop the database, run the following systemctl commands on the leader node: sudo systemctl stop anzograph and sudo systemctl start anzograph. You do not need to stop and start azgmgrd.

For instructions on configuring the connection to AnzoGraph in the Anzo application, see Connecting to AnzoGraph.

Related Topics