Installing Distributed Unstructured

Follow the instructions below to review the prerequisites and deploy and configure a Distributed Unstructured (DU) cluster.

  1. Review the Prerequisites
  2. Deploy the Leader Node
  3. Deploy the Worker Nodes
  4. Configure and Start the DU Services

Review the Prerequisites

Before installing DU, double-check that the following requirements are met. If any of the items are incomplete, complete those tasks before starting the installation.

Deploy the Leader Node

Follow the instructions below to deploy the DU leader node. The leader software is typically installed on the Anzo host server since it is a lightweight program that distributes requests to the worker instances for processing.

  1. Copy the DU installation script to the leader server and then run the following command to make the script executable:
    chmod +x <script_name>
  2. Before you run the script, make sure that you are logged in as the service user that has been designated to install and run all platform software. If necessary, you can run the following command to become the appropriate user:
    su <name>

    Where <name> is the name of the platform service user. For example:

    su anzo
  3. Run the following command to start the installation wizard:
    ./<script_name>

    The script unpacks the JRE and then waits for input before starting the installation.

  4. Press Enter to start the installation. The software license agreement is presented.
  5. Review the software license agreement. Press Enter to scroll through the terms. At the end of the agreement, type 1 to accept the terms or type 2 to disagree and stop the installation.

    When the agreement is accepted, the installer prompts you to specify the components to install:

    Which components should be installed?
    1: Leader [*1]
    2: Worker [*2]
    (To show the description of a component, please enter one of *1, *2)
    Please enter a comma-separated list of the selected values or [Enter] for the default selection:
    [1,2]
  6. At the components prompt, type 1 (Leader) and then press Enter.

    The installer prompts you to specify the installation path:

    Where should the Anzo Unstructured be installed?
    [/opt/AnzoDU]
  7. Specify the path and directory to install Anzo DU. Press Enter to accept the default installation path or type an alternate path and then press Enter.

    Next, the installer prompts for the hostname of this leader instance. It defaults to the IP address of the server:

    Set the hostname for this node.
    Enter the HostName/Address for this node.
    Hostname/Address
    [10.100.0.11]
  8. Press Enter to accept the default value. If necessary, type a different IP address, and then press Enter.

    The installer then prompts for any additional leader node hostnames. Typically there is one leader node and this value is specified as the same IP address as the previous step.

    Configure leader hostnames
    Please enter the hostnames or addresses for the leader nodes. Each entry
    comma separated.
    [10.100.0.11]
  9. If you set up additional leader nodes for redundancy, enter a comma separated list of the IP addresses for the alternate nodes. Otherwise, accept the default value and press Enter.

    Next, the installer prompts you to specify the maximum amount of memory that this leader instance can use. The installer lists the total RAM available and chooses 1/2 of the total memory as the default value.

    Choose the maximum memory that the node can use
    Please enter the maximum amount of RAM memory that the node may use.
    The minimum amount currently supported is 1024 MB. 29995 MB is available.
    Maximum Memory in MB
    [14998]
  10. Specify the maximum amount of memory (in MB) that this leader instance can use. Press Enter to accept the default value or specify an alternate value and then press Enter.
  11. The installation of the Anzo DU leader software begins and is configured according to the values that you specified. Proceed to Deploy the Worker Nodes to install the worker instances.

Deploy the Worker Nodes

Follow the instructions below to deploy the DU worker nodes.

  1. Copy the Anzo DU installation script to each of the worker servers and then run the following command to make the script executable:
    chmod +x <script_name>
  2. Before you run the script, make sure that you are logged in as the service user that has been designated to install and run all platform software. If necessary, you can run the following command to become the appropriate user:
    su <name>

    Where <name> is the name of the platform service user. For example:

    su anzo
  3. Run the following command to start the installation wizard:
    ./<script_name>

    The script unpacks the JRE and then waits for input before starting the installation.

  4. Press Enter to start the installation. The software license agreement is presented.
  5. Review the software license agreement. Press Enter to scroll through the terms. At the end of the agreement, type 1 to accept the terms or type 2 to disagree and stop the installation.

    When the agreement is accepted, the installer prompts you to specify the components to install:

    Which components should be installed?
    1: Leader [*1]
    2: Worker [*2]
    (To show the description of a component, please enter one of *1, *2)
    Please enter a comma-separated list of the selected values or [Enter] for the default selection:
    [1,2]
  6. At the components prompt, type 2 (Worker) and then press Enter.

    The installer prompts you to specify the installation path:

    Where should the Anzo Unstructured be installed?
    [/opt/AnzoDU]
  7. Specify the path and directory to install Anzo DU. Press Enter to accept the default installation path or type an alternate path and then press Enter.

    Next, the installer prompts for the hostname of this worker instance. It defaults to the IP address of the server:

    Set the hostname for this node.
    Enter the HostName/Address for this node.
    Hostname/Address
    [10.100.0.12]
  8. Press Enter to accept the default value. If necessary, type a different IP address, and then press Enter.

    The installer then prompts you to specify the maximum number of service instances for this worker node. Each service instance processes one unstructured document at a time. The default value is 4 instances.

    Choose the maximum number of service instances and worker port
    Please enter the maximum number of service instances to use.
    The minimum amount currently supported is 1.
    Maximum Service Instances
    [4]
  9. Press Enter to accept the default number of maximum service instances or specify another value and then press Enter.

    The installer now prompts you to specify the port to use for this worker. The default port is 2552.

    Worker Port
    [2552]
  10. Specify the port to use for this worker. Press Enter to accept the default value or type a different port and then press Enter.

    Next, the installer prompts you to specify the hostname(s) of the leader node(s).

    Configure leader hostnames
    Please enter the hostnames or addresses for the leader nodes. Each entry
    comma separated.
    []
  11. Specify the IP address for the leader instance that you deployed in Deploy the Leader Node above. If you deployed multiple leader nodes, specify each leader's IP address in a comma separated list.

    The installer now prompts you to specify the maximum amount of memory that this worker instance can use. The installer lists the total RAM available and chooses 1/2 of the total memory as the default value.

    Choose the maximum memory that the node can use
    Please enter the maximum amount of RAM memory that the node may use.
    The minimum amount currently supported is 1024 MB. 29995 MB is available.
    Maximum Memory in MB
    [14998]
  12. Specify the maximum amount of memory (in MB) that this worker instance can use. Press Enter to accept the default value or specify an alternate value and then press Enter.

    The installation of the Anzo DU worker software begins and is configured according to the values that you specified.

  13. Repeat the steps above for each worker instance in the cluster.

Once the leader and all of the worker nodes are installed, proceed to Configure and Start the DU Services.

If you upgraded the DU software, make sure that you restart the leader and worker applications.

In addition, restart the Anzo Server Akka Cluster Integration and Anzo Unstructured Distributed services. To restart these services:

  1. In the Administration application, expand the Servers menu and click Advanced Configuration.
  2. On the Advanced Configuration screen, click the I understand and accept the risk button to view the Anzo bundles.
  3. In the Search field at the top of the screen, start typing the name of the service that you want to restart. When the service appears in the list onscreen, click the service name to view the details.
  4. At the top of the screen, click Stop Bundle. Then click Start Bundle when the start option becomes available.

Configure and Start the DU Services

Once the cluster is installed, Cambridge Semantics recommends that you set up leader and worker services to ensure that applications run as the platform service user and can access the data that other platform components write to the shared file system. Follow the instructions below to configure the services.

Root user privileges are required to complete these tasks.

  1. Configure and Start the Leader Service
  2. Configure and Start the Worker Service

Configure and Start the Leader Service

Follow the instructions below to create and start the leader service.

  1. On the leader server, create a file called anzo-du-leader.service in the /usr/lib/systemd/system directory. For example:
    # vi /usr/lib/systemd/system/anzo-du-leader.service
  2. Add the following contents to anzo-du-leader.service. Placeholder values are shown in bold:
    [Unit]
    Description=Service for Distributed Unstructured Leader
    After=syslog.target network.target local-fs.target remote-fs.target nss-lookup.target
    
    [Service]
    Type=forking
    RemainAfterExit=yes
    LimitCPU=infinity
    LimitNOFILE=65536
    LimitAS=infinity
    LimitNPROC=65536
    LimitMEMLOCK=infinity
    LimitLOCKS=infinity
    LimitFSIZE=infinity
    ExecStart=/<install_path>/AnzoDU/leader start
    ExecStop=/<install_path>/AnzoDU/leader stop
    User=<service_user_name>
    Group=<service_user_group>
    
    [Install]
    WantedBy=default.target

    Where <install_path> is the installation path and <service_user_name> and <service_user_group> are the name and group for the platform service user account. For example:

    [Unit]
    Description=Service for Distributed Unstructured Leader
    After=syslog.target network.target local-fs.target remote-fs.target nss-lookup.target
    
    [Service]
    Type=forking
    RemainAfterExit=yes
    LimitCPU=infinity
    LimitNOFILE=65536
    LimitAS=infinity
    LimitNPROC=65536
    LimitMEMLOCK=infinity
    LimitLOCKS=infinity
    LimitFSIZE=infinity
    ExecStart=/opt/AnzoDU/leader start
    ExecStop=/opt/AnzoDU/leader stop
    User=anzo
    Group=anzo
     
    [Install]
    WantedBy=default.target
  3. Save and close the file, and then run the following commands to start and enable the new service:
    # systemctl start anzo-du-leader.service
    # systemctl enable anzo-du-leader.service

Once the service is enabled, the leader should be running. Any time you start and stop the leader, run the following systemctl commands: sudo systemctl stop anzo-du-leader and sudo systemctl start anzo-du-leader.

Configure and Start the Worker Service

Follow the instructions below to create and start the worker service. Complete the steps below on each worker node in the cluster.

  1. Create a file called anzo-du-worker.service in the /usr/lib/systemd/system directory. For example:
    # vi /usr/lib/systemd/system/anzo-du-worker.service
  2. Add the following contents to anzo-du-worker.service. Placeholder values are shown in bold:
    [Unit]
    Description=Service for Distributed Unstructured Worker
    After=syslog.target network.target local-fs.target remote-fs.target nss-lookup.target
    
    [Service]
    Type=forking
    RemainAfterExit=yes
    LimitCPU=infinity
    LimitNOFILE=65536
    LimitAS=infinity
    LimitNPROC=65536
    LimitMEMLOCK=infinity
    LimitLOCKS=infinity
    LimitFSIZE=infinity
    ExecStart=/<install_path>/AnzoDU/worker start
    ExecStop=/<install_path>/AnzoDU/worker stop
    User=<service_user_name> 
    Group=<service_user_group>
    
    [Install]
    WantedBy=default.target

    Where <install_path> is the installation path and <service_user_name> and <service_user_group> are the name and group for the platform service user account. For example:

    [Unit]
    Description=Service for Distributed Unstructured Worker
    After=syslog.target network.target local-fs.target remote-fs.target nss-lookup.target
    
    [Service]
    Type=forking
    RemainAfterExit=yes
    LimitCPU=infinity
    LimitNOFILE=65536
    LimitAS=infinity
    LimitNPROC=65536
    LimitMEMLOCK=infinity
    LimitLOCKS=infinity
    LimitFSIZE=infinity
    ExecStart=/opt/AnzoDU/worker start
    ExecStop=/opt/AnzoDU/worker stop
    User=anzo
    Group=anzo
    
    [Install]
    WantedBy=default.target
  3. Save and close the file, and then run the following commands to start and enable the new service:
    # systemctl start anzo-du-worker.service
    # systemctl enable anzo-du-worker.service
  4. Repeat the steps above for each worker server.

Once the service is enabled, the worker should be running. Any time you start and stop a worker, run the following systemctl commands: sudo systemctl stop anzo-du-worker and sudo systemctl start anzo-du-worker.

Any time the AU leader instance is restarted, the Anzo Server Akka Cluster Integration and Anzo Unstructured Distributed services must be restarted in Anzo. To restart a service:

  1. In the Administration application, expand the Servers menu and click Advanced Configuration.
  2. On the Advanced Configuration screen, click the I understand and accept the risk button to view the Anzo bundles.
  3. In the Search field at the top of the screen, start typing the name of the service that you want to restart. When the service appears in the list onscreen, click the service name to view the details.
  4. At the top of the screen, click Stop Bundle. Then click Start Bundle when the start option becomes available.

For next steps in setting up the unstructured environment, see Installing Elasticsearch. For instructions on connecting the DU cluster to Anzo, see Connecting to a Distributed Unstructured Cluster.