Installing and Configuring Elasticsearch

This topic provides instructions for deploying Elasticsearch for use in the Anzo Unstructured environment.

Important: Elasticsearch cannot be run as the root user and must have read and write access to the Anzo file store. Therefore, it is important to install and run Elasticsearch as the same user and group that runs Anzo, otherwise unstructured pipelines will fail due to permissions errors.

  1. Make sure that the Elasticsearch host server has access to the Anzo shared file system and meets the requirements in Elasticsearch Requirements and Recommendations.
  2. Become the Anzo user before proceeding. If necessary, create the user on the server. For more information, see Create an Anzo User and Installation Directory.
  3. Download Elasticsearch version 7.1.1 from the Elasticsearch Past Releases website. Version 7.1.1 Docker images are also available from the Docker @ Elastic website. Follow the Elasticsearch documentation to install the software.
  4. Configure Elasticsearch to save snapshots to the Anzo shared file system.
    • For a mounted file system, such as NFS, uncomment the Path setting, path.repo, in elasticsearch_install_dir/config/elasticsearch.yml and specify the path and directory for the mounted file system:
      path.repo: /path/directory

      For example:

      path.repo: /opt/anzoshare
    • For S3, see S3 Repository Plugin in the Elasticsearch documentation for information about installing the S3 repository plugin. Then see Client Settings for instructions on configuring the S3 client.
    • For HDFS, see Hadoop HDFS Repository Plugin in the Elasticsearch documentation for information about installing the HDFS repository plugin. Then see Hadoop Security for information about configuring Kerberos authentication.
  5. Configure the amount of memory that Elasticsearch can use. By default, Elasticsearch is configured to use a maximum heap size of 1 GB. Cambridge Semantics recommends that you increase the amount to 50% of the memory that is available on the server. To change the configuration, open the elasticsearch_install_dir/config/jvm.options file in an editor. At the top of the file, modify the Xms and Xmx values to replace the 1 with the new value. For example:
    # Xms represents the initial size of total heap space
    # Xmx represents the maximum size of total heap space
    
    -Xms15g
    -Xmx15g
  6. If you want to secure the Elasticsearch instance, follow the instructions in Configuring security in Elasticsearch in the Elasticsearch documentation.

    Important: If you set up SSL authentication with a trusted certificate, make sure that you add the certificate to the Anzo trust store. For instructions, see Uploading a Certificate to the Anzo Trust Store .

  7. When the configuration is complete, run the following command to start Elasticsearch:
    ./elasticsearch_install_dir/bin/elasticsearch

    For more information about starting Elasticsearch, see Starting Elasticsearch in the Elasticsearch documentation. For information about configuring Elasticsearch to start automatically as the Anzo user, see Configuring an Elasticsearch Service below.

Once this Elasticsearch instance is configured and running, follow the instructions in Configuring an Elasticsearch Connection to connect Anzo to this instance.

Configuring an Elasticsearch Service

Cambridge Semantics recommends that you configure an Elasticsearch service for starting Elasticsearch automatically as the Anzo service user. Follow the instructions below to implement the service.

Note: Root user privileges are required to complete this task.

  1. Create a file called es.service in the /usr/lib/systemd/system directory. For example:
    # vi /usr/lib/systemd/system/es.service
  2. Add the following contents to es.service:
    [Unit]
    Description=elasticsearch
    Wants=network-online.target
    After=network-online.target
    [Service]
    Type=oneshot
    ExecStart=/sbin/runuser -l Anzo_user /install_path/elasticsearch-7.1.1/bin/elasticsearch
    [Install]
    WantedBy=multi-user.target

    Where install_path is the path to elasticsearch and Anzo_user is the Anzo service user. For example:

    [Unit]
    Description=elasticsearch
    Wants=network-online.target
    After=network-online.target
    [Service]
    Type=oneshot
    ExecStart=/sbin/runuser -l anzo /opt/elasticsearch-7.1.1/bin/elasticsearch
    [Install]
    WantedBy=multi-user.target
  3. Save and close the file, and then run the following commands to start and enable the new service:
    # systemctl enable es.service
    # systemctl status es.service
    # systemctl start es.service

    Once the service is in place, Elasticsearch should be stopped and started via systemctl. For example, systemctl stop es and systemctl start es.

Related Topics