Installing Anzo Unstructured

This topic provides instructions for deploying an Anzo Distributed Unstructured cluster.

See Anzo Unstructured Requirements for details about server requirements.
  1. Complete the Pre-Installation Configuration
  2. Deploy the Leader Node
  3. Deploy the Worker Nodes
  4. Complete the Post-Installation Configuration

Complete the Pre-Installation Configuration

Configure User Resource Limits

Before installing Anzo Unstructured, Cambridge Semantics recommends that you tune the user resource limits (ulimits) for your Linux distribution to increase the limits for the following resources. Tune ulimits on all AU host servers in the cluster:

  • Increase the limit for the following resources to at least 65535:
    • open files (nofile)
    • max user processes (nproc)
  • Increase the limit for the following resources to infinity:
    • address space (as)
    • CPU time (cpu)
    • file locks (locks)
    • file size (fsize)
    • max memory size (memlock)

To view the current ulimits, run ulimit -a. To permanently change ulimits, modify the /etc/security/limits.conf file. For information, see How to set ulimit values in the RHEL support documentation.

Typically, as part of post-installation configuration, a systemd service is set up to start and stop the Leader and Worker processes. When systemd starts a process, however, it uses the limits that are defined in the systemd service rather than the limits in /etc/security/limits.conf. In addition to changing the ulimits in limits.conf, it is important to set the limits in the Leader and Worker services. The service file contents shown in Complete the Post-Installation Configuration includes the recommended ulimit settings.

Use the Anzo Service User Account when Installing AU

Since the Anzo Unstructured cluster will access the shared file store, it is important to install and run the software with the same service account that runs Anzo. For more information, see Anzo Service Account Requirements.

Deploy the Leader Node

Follow the instructions below to deploy the Anzo Distributed Unstructured (DU) Leader node.

  1. Make sure that the Leader host server has access to the Anzo shared file system and meets the requirements in Anzo Unstructured Cluster Requirements and Recommendations.
  2. Copy the Anzo DU installation script to the Leader server and then run the following command to make the script executable:
    chmod +x <script_name>
  3. If necessary, run the following command to become the Anzo service user:
    su <name>

    Where <name> is the name of the service user. For example:

    su anzo
  4. Run the following command to start the installation wizard:
    ./<script_name>

    The script unpacks the JRE and then waits for input before starting the installation.

  5. Press Enter to start the installation.
  6. Review the software license agreement. Press Enter to scroll through the terms. At the end of the agreement, type 1 to accept the terms or type 2 to disagree and stop the installation.
  7. At the prompt that asks which components to install, type 1 (Leader) and then press Enter.
  8. Specify the directory to install Anzo DU. Press Enter to accept the default installation path or type an alternate path and then press Enter.
  9. The wizard prompts for the IP address of this leader instance. The wizard defaults to the IP address of the server. Press Enter to accept the default value. If necessary, type a different IP address, and then press Enter.
  10. The wizard prompts for any additional leader node IP addresses. Typically there is one leader node and this value is specified as the same IP address as the previous step. If you set up additional leader nodes for redundancy, however, enter a comma separated list of the alternate nodes. Otherwise, accept the default value and press Enter.
  11. Specify the maximum amount of memory (in MB) that this leader instance can use. The install wizard lists the total RAM available and chooses 1/2 of the total memory as the default value. Adjust the value as needed or accept the default value and then press Enter.
  12. The wizard proceeds to install Anzo DU according to the values that you specified. Proceed to Deploy the Worker Nodes to install the Worker instances.

Deploy the Worker Nodes

Follow the instructions below to deploy the Anzo Distributed Unstructured (DU) Worker nodes.

  1. Make sure that the Worker host servers have access to the Anzo shared file system and meet the requirements in Anzo Unstructured Cluster Requirements and Recommendations.
  2. Copy the Anzo DU installation script to each of the Worker servers and then run the following command to make the script executable:
    chmod +x <script_name>
  3. If necessary, run the following command to become the Anzo service user:
    su <name>

    Where <name> is the name of the service user. For example:

    su anzo
  4. Run the following command to start the installation wizard:
    ./<script_name>

    The script unpacks the JRE and then waits for input before starting the installation.

  5. Press Enter to start the installation.
  6. Review the software license agreement. Press Enter to scroll through the terms. At the end of the agreement, type 1 to accept the terms or type 2 to disagree and stop the installation.
  7. At the prompt that asks which components to install, type 2 (Worker) and then press Enter.
  8. Specify the directory to install Anzo DU. Press Enter to accept the default installation path or type an alternate path and then press Enter.
  9. The wizard prompts for the IP address to use for this worker node. The wizard defaults to the IP address of the server. Press Enter to accept the default value. If necessary, type a different IP address, and then press Enter.
  10. The wizard prompts you to specify the maximum number of service instances for this Worker node. Each service instance processes one unstructured document at a time. The default value is 2 instances. Press Enter to accept the default or specify another value and then press Enter.
  11. Specify the port to use for this Worker. The wizard defaults to port 2552. Press Enter to accept the default value or type a different port and then press Enter.
  12. The wizard prompts you to enter the IP address of the Leader node. Specify the IP address for the Leader instance that you deployed in the procedure above. If you deployed multiple Leader nodes, specify each Leader's IP address in a comma separated list.
  13. Specify the maximum amount of memory (in MB) that this Worker instance can use. The install wizard lists the total RAM available and chooses 1/2 of the total memory as the default value. Adjust the value as needed or accept the default value and then press Enter.

    The wizard proceeds to install Anzo DU according to the values that you specified.

  14. Repeat the steps above for each Worker instance in the cluster.

Once the Leader and all of the Worker nodes are installed, proceed to Complete the Post-Installation Configuration to complete the initial configuration and start the software.

If you upgraded the Anzo Unstructured software, make sure that you restart the Leader and Worker applications. In addition, restart the following two services in Anzo:

  • Anzo Server Akka Cluster Integration
  • Anzo Unstructured Distributed

Complete the Post-Installation Configuration

Once the Anzo Unstructured (AU) cluster is installed, Cambridge Semantics recommends that you set up Leader and Worker services to ensure that AU runs as the Anzo service user and can access the data that other platform components write to the shared file system. Follow the instructions in the steps below to configure the services.

Root user privileges are required to complete these tasks.
  1. Configure and Start the Leader Service
  2. Configure and Start the Worker Service

Configure and Start the Leader Service

Follow the instructions below to create and start the Leader service.

  1. On the Leader server, create a file called anzo-du-leader.service in the /usr/lib/systemd/system directory. For example:
    # vi /usr/lib/systemd/system/anzo-du-leader.service
  2. Add the following contents to anzo-du-leader.service. Placeholder values are shown in bold:
    [Unit]
    Description=Service for Distributed Unstructured Leader
    After=syslog.target network.target local-fs.target remote-fs.target nss-lookup.target
    
    [Service]
    Type=forking
    RemainAfterExit=yes
    LimitCPU=infinity
    LimitNOFILE=65536
    LimitAS=infinity
    LimitNPROC=65536
    LimitMEMLOCK=infinity
    LimitLOCKS=infinity
    LimitFSIZE=infinity
    ExecStart=/install_path/leader start
    ExecStop=/install_path/leader stop
    User=service_user_name
    Group=service_user_name
    
    [Install]
    WantedBy=default.target

    Where install_path is the Anzo DU installation path and directory and service_user_name is the name of the Anzo service user. For example:

    [Unit]
    Description=Service for Distributed Unstructured Leader
    After=syslog.target network.target local-fs.target remote-fs.target nss-lookup.target
    
    [Service]
    Type=forking
    RemainAfterExit=yes
    LimitCPU=infinity
    LimitNOFILE=65536
    LimitAS=infinity
    LimitNPROC=65536
    LimitMEMLOCK=infinity
    LimitLOCKS=infinity
    LimitFSIZE=infinity
    ExecStart=/opt/AnzoDU/leader start
    ExecStop=/opt/AnzoDU/leader stop
    User=anzo
    Group=anzo
    
    [Install]
    WantedBy=default.target
  3. Save and close the file, and then run the following commands to start and enable the new service:
    # systemctl start anzo-du-leader.service
    # systemctl enable anzo-du-leader.service

Once the service is enabled, the Leader should be running. Any time you start and stop the Leader, run the following systemctl commands: sudo systemctl stop anzo-du-leader and sudo systemctl start anzo-du-leader.

Configure and Start the Worker Service

Follow the instructions below to create and start the Worker service. Complete the steps below on each Worker node in the cluster.

  1. Create a file called anzo-du-worker.service in the /usr/lib/systemd/system directory. For example:
    # vi /usr/lib/systemd/system/anzo-du-worker.service
  2. Add the following contents to anzo-du-worker.service. Placeholder values are shown in bold:
    [Unit]
    Description=Service for Distributed Unstructured Worker
    After=syslog.target network.target local-fs.target remote-fs.target nss-lookup.target
    
    [Service]
    Type=forking
    RemainAfterExit=yes
    LimitCPU=infinity
    LimitNOFILE=65536
    LimitAS=infinity
    LimitNPROC=65536
    LimitMEMLOCK=infinity
    LimitLOCKS=infinity
    LimitFSIZE=infinity
    ExecStart=/install_path/worker start
    ExecStop=/install_path/worker stop
    User=service_user_name
    Group=service_user_name
    
    [Install]
    WantedBy=default.target

    Where install_path is the Anzo DU installation path and directory and service_user_name is the name of the Anzo service user. For example:

    [Unit]
    Description=Service for Distributed Unstructured Worker
    After=syslog.target network.target local-fs.target remote-fs.target nss-lookup.target
    
    [Service]
    Type=forking
    RemainAfterExit=yes
    LimitCPU=infinity
    LimitNOFILE=65536
    LimitAS=infinity
    LimitNPROC=65536
    LimitMEMLOCK=infinity
    LimitLOCKS=infinity
    LimitFSIZE=infinity
    ExecStart=/opt/AnzoDU/worker start
    ExecStop=/opt/AnzoDU/worker stop
    User=anzo
    Group=anzo
    
    [Install]
    WantedBy=default.target
  3. Save and close the file, and then run the following commands to start and enable the new service:
    # systemctl start anzo-du-worker.service
    # systemctl enable anzo-du-worker.service
  4. Repeat the steps above for each Worker server.

Once the service is enabled, the Worker should be running. Any time you start and stop a Worker, run the following systemctl commands: sudo systemctl stop anzo-du-worker and sudo systemctl start anzo-du-worker.

After deploying an Anzo Unstructured cluster, you do not need to perform additional configuration in Anzo to connect to the cluster. The connection is configured automatically based on the values specified during installation. You can view the Distributed Pipeline options in Server Settings in the Administration application. For more information, see Configure Network Connections to an Anzo Distributed Unstructured Cluster.

Any time the AU Leader instance is restarted, the following two services must be restarted in Anzo:

  • Anzo Server Akka Cluster Integration
  • Anzo Unstructured Distributed

To restart a service:

  1. In the Administration application, expand the Servers menu and click Advanced Configuration.
  2. On the Advanced Configuration screen, click the I understand and accept the risk button to view the Anzo bundles.
  3. In the Search field at the top of the screen, start typing the name of the service that you want to restart. When the service appears in the list onscreen, click the service name to view the details.
  4. At the top of the screen, click Stop Bundle. Then click Start Bundle when the start option becomes available.
Related Topics