Installing Elasticsearch

Follow the instructions below to review the prerequisites and install Elasticsearch for use in the Anzo platform.

  1. Review the Prerequisites
  2. Install Elasticsearch
  3. Configure the Elasticsearch Service

Review the Prerequisites

Before installing Elasticsearch, double-check that the following requirements are met. If any of the items are incomplete, complete those tasks before starting the installation.

Install Elasticsearch

  1. Download a supported Elasticsearch version from the Elasticsearch Past Releases website. Docker images are also available from the Docker @ Elastic website.

    Anzo supports Elasticsearch Versions 7.10.2 – 7.17.3.

  2. Before you install Elasticsearch, make sure that you are logged in as the service user that has been designated to install and run all platform software. If necessary, you can run the following command to become the appropriate user:
    su <name>

    Where <name> is the name of the platform service user. For example:

    su anzo
  3. Follow the appropriate version of the Elasticsearch Guide to install and configure the software.
  4. As part of the Elasticsearch configuration, Elastic recommends that you modify the following Linux kernel configuration settings:
    • vm.swappiness: Controls the tendency of the kernel to move processes out of physical memory and onto the swap disk. Elastic recommends that you set this value to 1.
    • vm.max_map_count: Sets the limit on the maximum number of memory map areas a process can use. Elastic recommends that you set this value to 262144.

    You have two options for configuring the values:

    1. You can update the /etc/sysctl.conf file to include the following contents:
      # For more information, see sysctl.conf(5) and sysctl.d(5).
      vm.swappiness = 1
      vm.max_map_count = 262144

      With this method, you must reboot the system to apply the configuration changes after sysctl.conf is updated.

    2. You can run the following sysctl commands to configure the settings:
      # sysctl -e vm.swappiness=1
      # sysctl -e vm.max_map_count=262144
  5. Next, configure Elasticsearch to save snapshots to the Anzo shared file system.
    • For a mounted file system, such as NFS, uncomment the Path setting, path.repo (or path.data in some versions), in <elasticsearch_install_path>/config/elasticsearch.yml and specify the path and directory for the mounted file system:
      path.repo: /<path>/<directory>

      For example:

      path.repo: /opt/anzoshare
    • For S3, see S3 Repository Plugin in the Elasticsearch documentation for information about installing the S3 repository plugin. Then see Client Settings for instructions on configuring the S3 client.
    • For HDFS, see Hadoop HDFS Repository Plugin in the Elasticsearch documentation for information about installing the HDFS repository plugin. Then see Hadoop Security for information about configuring Kerberos authentication.
  6. Configure the amount of memory that Elasticsearch can use. By default, Elasticsearch is configured to use a maximum heap size of 1 GB. Cambridge Semantics recommends that you increase the amount to 50% of the memory that is available on the server. To change the configuration, open the <elasticsearch_install_path>/config/jvm.options file in an editor. At the top of the file, modify the Xms and Xmx values to replace the 1 with the new value. For example:
    # Xms represents the initial size of total heap space
    # Xmx represents the maximum size of total heap space
    
    -Xms15g
    -Xmx15g
  7. If you want to secure the Elasticsearch instance, follow the instructions in Configuring security in Elasticsearch in the Elasticsearch documentation.

    If you set up SSL authentication with a trusted certificate, make sure that you add the certificate to the Anzo trust store. For instructions, see See Adding a Certificate to the Trust Store.

  8. When the configuration is complete, see Configure the Elasticsearch Service below for instructions on configuring Elasticsearch to start automatically as the Anzo user.

Configure the Elasticsearch Service

Cambridge Semantics recommends that you configure an Elasticsearch service for starting Elasticsearch automatically as the platform service user. Follow the instructions below to implement the service.

Root user privileges are required to complete this task.

  1. Create a file called elasticsearch.service in the /usr/lib/systemd/system directory. For example:
    # vi /usr/lib/systemd/system/elasticsearch.service
  2. Add the following contents to elasticsearch.service. The text below includes placeholder <install_path>, <service_user_name>, and <service_user_group> values.
    [Unit]
    Description=Elasticsearch
    Documentation=https://www.elastic.co
    Wants=network-online.target
    After=network-online.target
    
    [Service]
    Type=forking
    RuntimeDirectory=elasticsearch
    # Use the following setting to specify an alternate Java JVM if not using the 
    # embedded JVM in elasticsearch/jdk.
    # Environment=ES_JAVA_HOME=<java_install_path>
    Environment=ES_HOME=<install_path>
    Environment=ES_PATH_CONF=<install_path>/config
    
    User=<service_user_name>
    Group=<service_user_group>
     
    ExecStart=<install_path>/bin/elasticsearch --daemonize
    
    # Specifies the maximum file descriptor number that can be opened by this process
    LimitNOFILE=65535
    
    # Specifies the maximum number of processes
    LimitNPROC=4096
    
    # Specifies the maximum size of virtual memory
    LimitAS=infinity
    
    # Specifies the maximum file size
    LimitFSIZE=infinity
    
    # Max Locked Memory
    LimitMEMLOCK=infinity
    
    # Disable timeout logic and wait until process is stopped
    TimeoutStopSec=0
    
    # SIGTERM signal is used to stop the Java process
    KillSignal=SIGTERM
    
    # Send the signal only to the JVM rather than its control group
    KillMode=process
    
    # Java process is never killed
    SendSIGKILL=no
    
    # When a JVM receives a SIGTERM signal it exits with code 143
    SuccessExitStatus=143
    
    # Allow a slow startup before the systemd notifier module kicks in to extend the timeout
    TimeoutStartSec=75
    
    [Install]
    WantedBy=multi-user.target

    Where <install_path> is the installation path and <service_user_name> and <service_user_group> are the name and group for the platform service user account. For example:

    [Unit]
    Description=Elasticsearch
    Documentation=https://www.elastic.co
    Wants=network-online.target
    After=network-online.target
    
    [Service]
    Type=forking
    RuntimeDirectory=elasticsearch
    # Use the following setting to specify an alternate Java JVM if not using the 
    # embedded JVM in elasticsearch/jdk.
    # Environment=ES_JAVA_HOME=<java_install_path>
    Environment=ES_HOME=/opt/elasticsearch
    Environment=ES_PATH_CONF=/opt/elasticsearch/config
    
    User=anzo
    Group=anzo
    
    ExecStart=/opt/elasticsearch/bin/elasticsearch --daemonize
    
    # Specifies the maximum file descriptor number that can be opened by this process
    LimitNOFILE=65535
    
    # Specifies the maximum number of processes
    LimitNPROC=4096
    
    # Specifies the maximum size of virtual memory
    LimitAS=infinity
    
    # Specifies the maximum file size
    LimitFSIZE=infinity
    
    # Max Locked Memory
    LimitMEMLOCK=infinity
    
    # Disable timeout logic and wait until process is stopped
    TimeoutStopSec=0
    
    # SIGTERM signal is used to stop the Java process
    KillSignal=SIGTERM
    
    # Send the signal only to the JVM rather than its control group
    KillMode=process
    
    # Java process is never killed
    SendSIGKILL=no
    
    # When a JVM receives a SIGTERM signal it exits with code 143
    SuccessExitStatus=143
    
    # Allow a slow startup before the systemd notifier module kicks in to extend the timeout
    TimeoutStartSec=75
    
    [Install]
    WantedBy=multi-user.target
  3. Save and close the file, and then run the following commands to start and enable the new service:
    # systemctl enable elasticsearch.service
    # systemctl status elasticsearch.service
    # systemctl start elasticsearch.service

    Once the service is in place, Elasticsearch should be stopped and started via systemctl. For example, systemctl stop elasticsearch and systemctl start elasticsearch.

Once this Elasticsearch instance is configured and running, follow the instructions in Connecting to Elasticsearch to connect Anzo to this instance.