Installing and Configuring Elasticsearch

This topic provides instructions for deploying Elasticsearch for use in the Anzo Unstructured environment.

Elasticsearch cannot be run as the root user and must have read and write access to the Anzo file store. Therefore, it is important to install and run Elasticsearch as the Anzo service user, otherwise unstructured pipelines will fail due to permissions errors. For more information, see Anzo Service Account Requirements.

  1. Make sure that the Elasticsearch host server has access to the Anzo shared file system and meets the requirements in Elasticsearch Requirements and Recommendations.
  2. Become the Anzo service user before proceeding. If necessary, create the user on the server. For more information, see Make Sure the Anzo Service User Account is Created.
  3. Download Elasticsearch version 7.10.2 from the Elasticsearch Past Releases website. Docker images are also available from the Docker @ Elastic website. Follow the Elasticsearch documentation to install the software.
    Elasticsearch versions 7.11 and 7.12 are not supported at this time.
  4. Configure Elasticsearch to save snapshots to the Anzo shared file system.
    • For a mounted file system, such as NFS, uncomment the Path setting, path.repo, in <elasticsearch_install_path>/config/elasticsearch.yml and specify the path and directory for the mounted file system:
      path.repo: /<path>/<directory>

      For example:

      path.repo: /opt/anzoshare
    • For S3, see S3 Repository Plugin in the Elasticsearch documentation for information about installing the S3 repository plugin. Then see Client Settings for instructions on configuring the S3 client.
    • For HDFS, see Hadoop HDFS Repository Plugin in the Elasticsearch documentation for information about installing the HDFS repository plugin. Then see Hadoop Security for information about configuring Kerberos authentication.
  5. Configure the amount of memory that Elasticsearch can use. By default, Elasticsearch is configured to use a maximum heap size of 1 GB. Cambridge Semantics recommends that you increase the amount to 50% of the memory that is available on the server. To change the configuration, open the <elasticsearch_install_path>/config/jvm.options file in an editor. At the top of the file, modify the Xms and Xmx values to replace the 1 with the new value. For example:
    # Xms represents the initial size of total heap space
    # Xmx represents the maximum size of total heap space
    
    -Xms15g
    -Xmx15g
  6. If you want to secure the Elasticsearch instance, follow the instructions in Configuring security in Elasticsearch in the Elasticsearch documentation.

    If you set up SSL authentication with a trusted certificate, make sure that you add the certificate to the Anzo trust store. For instructions, see Adding a Certificate to the Trust Store.

  7. When the configuration is complete, run the following command to start Elasticsearch:
    ./<install_path>/bin/elasticsearch

    For more information about starting Elasticsearch, see Starting Elasticsearch in the Elasticsearch documentation. For information about configuring Elasticsearch to start automatically as the Anzo user, see Configuring an Elasticsearch Service below.

Once this Elasticsearch instance is configured and running, follow the instructions in Connecting to Elasticsearch to connect Anzo to this instance.

Configuring an Elasticsearch Service

Cambridge Semantics recommends that you configure an Elasticsearch service for starting Elasticsearch automatically as the Anzo service user. Follow the instructions below to implement the service.

Root user privileges are required to complete this task.
  1. Create a file called es.service in the /usr/lib/systemd/system directory. For example:
    # vi /usr/lib/systemd/system/es.service
  2. Add the following contents to es.service:
    [Unit]
    Description=elasticsearch
    Wants=network-online.target
    After=network-online.target
    [Service]
    Type=oneshot
    ExecStart=/sbin/runuser -l Anzo_service_user /install_path/elasticsearch-7.10.2/bin/elasticsearch
    [Install]
    WantedBy=multi-user.target

    Where Anzo_service_user is the name of the Anzo service user, and install_path is the path to Elasticsearch. For example:

    [Unit]
    Description=elasticsearch
    Wants=network-online.target
    After=network-online.target
    [Service]
    Type=oneshot
    ExecStart=/sbin/runuser -l anzo /opt/elasticsearch-7.10.2/bin/elasticsearch
    [Install]
    WantedBy=multi-user.target
  3. Save and close the file, and then run the following commands to start and enable the new service:
    # systemctl enable es.service
    # systemctl status es.service
    # systemctl start es.service

    Once the service is in place, Elasticsearch should be stopped and started via systemctl. For example, systemctl stop es and systemctl start es.

Related Topics