Installing and Configuring Elasticsearch

This topic provides instructions for deploying Elasticsearch for use in the Anzo platform.

Elasticsearch cannot be run as the root user and must have read and write access to the Anzo file store. Therefore, it is important to install and run Elasticsearch as the Anzo service user, otherwise unstructured pipelines will fail due to permissions errors. For more information, see Anzo Service Account Requirements.

Make sure that the Elasticsearch host server has access to the Anzo shared file system and meets the requirements in Elasticsearch Requirements and Recommendations.
Become the Anzo service user before proceeding. If necessary, create the user on the server. For more information, see Make Sure the Anzo Service User Account is Created.
Download a supported Elasticsearch version from the Elasticsearch Past Releases website. Docker images are also available from the Docker @ Elastic website.
Anzo supports Elasticsearch Versions 7.10.2 – 7.17.3.
Follow the appropriate version of the Elasticsearch Guide to install and configure the software.
As part of the Elasticsearch configuration, Elastic recommends that you modify the following Linux kernel configuration settings:
- vm.swappiness: Controls the tendency of the kernel to move processes out of physical memory and onto the swap disk. Elastic recommends that you set this value to 1.
- vm.max_map_count: Sets the limit on the maximum number of memory map areas a process can use. Elastic recommends that you set this value to 262144.
You have two options for configuring the values:
1. You can update the /etc/sysctl.conf file to include the following contents:
```
# For more information, see sysctl.conf(5) and sysctl.d(5).
vm.swappiness = 1
vm.max_map_count = 262144
```
  With this method, you must reboot the system to apply the configuration changes after sysctl.conf is updated.
2. You can run the following sysctl commands to configure the settings:
```
# sysctl -e vm.swappiness=1
```
```
# sysctl -e vm.max_map_count=262144
```
Next, configure Elasticsearch to save snapshots to the Anzo shared file system.
- For a mounted file system, such as NFS, uncomment the Path setting, path.repo (or path.data in some versions), in <elasticsearch_install_path>/config/elasticsearch.yml and specify the path and directory for the mounted file system:
```
path.repo: /<path>/<directory>
```
  For example:
```
path.repo: /opt/anzoshare
```
- For S3, see S3 Repository Plugin in the Elasticsearch documentation for information about installing the S3 repository plugin. Then see Client Settings for instructions on configuring the S3 client.
- For HDFS, see Hadoop HDFS Repository Plugin in the Elasticsearch documentation for information about installing the HDFS repository plugin. Then see Hadoop Security for information about configuring Kerberos authentication.
Configure the amount of memory that Elasticsearch can use. By default, Elasticsearch is configured to use a maximum heap size of 1 GB. Cambridge Semantics recommends that you increase the amount to 50% of the memory that is available on the server. To change the configuration, open the <elasticsearch_install_path>/config/jvm.options file in an editor. At the top of the file, modify the Xms and Xmx values to replace the 1 with the new value. For example:
```
# Xms represents the initial size of total heap space
# Xmx represents the maximum size of total heap space

-Xms15g
-Xmx15g
```
If you want to secure the Elasticsearch instance, follow the instructions in Configuring security in Elasticsearch in the Elasticsearch documentation.
If you set up SSL authentication with a trusted certificate, make sure that you add the certificate to the Anzo trust store. For instructions, see See Adding a Certificate to the Anzo Trust Store.
When the configuration is complete, see Configuring an Elasticsearch Service below for instructions on configuring Elasticsearch to start automatically as the Anzo user.

Configuring an Elasticsearch Service

Cambridge Semantics recommends that you configure an Elasticsearch service for starting Elasticsearch automatically as the Anzo service user. Follow the instructions below to implement the service.

Root user privileges are required to complete this task.

Create a file called elasticsearch.service in the /usr/lib/systemd/system directory. For example:
```
# vi /usr/lib/systemd/system/elasticsearch.service
```

Add the following contents to elasticsearch.service. The text below includes placeholder <elasticsearch_install_path>, <anzo_service_user>, and <anzo_service_group> values. Replace the placeholders with the appropriate values for your Elasticsearch installation location as well as the user and group name for your Anzo service user account.

[Unit]
Description=Elasticsearch
Documentation=https://www.elastic.co
Wants=network-online.target
After=network-online.target

[Service]
Type=forking
RuntimeDirectory=elasticsearch
# Use the following setting to specify an alternate Java JVM if not using the 
# embedded JVM in elasticsearch/jdk.
# Environment=ES_JAVA_HOME=<java_install_path>
Environment=ES_HOME=<elasticsearch_install_path>
Environment=ES_PATH_CONF=<elasticsearch_install_path>/config

User=<anzo_service_user>
Group=<anzo_service_group>

ExecStart=<elasticsearch_install_path>/bin/elasticsearch --daemonize

# Specifies the maximum file descriptor number that can be opened by this process
LimitNOFILE=65535

# Specifies the maximum number of processes
LimitNPROC=4096

# Specifies the maximum size of virtual memory
LimitAS=infinity

# Specifies the maximum file size
LimitFSIZE=infinity

# Max Locked Memory
LimitMEMLOCK=infinity

# Disable timeout logic and wait until process is stopped
TimeoutStopSec=0

# SIGTERM signal is used to stop the Java process
KillSignal=SIGTERM

# Send the signal only to the JVM rather than its control group
KillMode=process

# Java process is never killed
SendSIGKILL=no

# When a JVM receives a SIGTERM signal it exits with code 143
SuccessExitStatus=143

# Allow a slow startup before the systemd notifier module kicks in to extend the timeout
TimeoutStartSec=75

[Install]
WantedBy=multi-user.target

The following example shows a completed elasticsearch.service file:

[Unit]
Description=Elasticsearch
Documentation=https://www.elastic.co
Wants=network-online.target
After=network-online.target

[Service]
Type=forking
RuntimeDirectory=elasticsearch
# Use the following setting to specify an alternate Java JVM if not using the 
# embedded JVM in elasticsearch/jdk.
# Environment=ES_JAVA_HOME=<java_install_path>
Environment=ES_HOME=/opt/elasticsearch
Environment=ES_PATH_CONF=/opt/elasticsearch/config

User=anzo
Group=anzo

ExecStart=/opt/elasticsearch/bin/elasticsearch --daemonize

# Specifies the maximum file descriptor number that can be opened by this process
LimitNOFILE=65535

# Specifies the maximum number of processes
LimitNPROC=4096

# Specifies the maximum size of virtual memory
LimitAS=infinity

# Specifies the maximum file size
LimitFSIZE=infinity

# Max Locked Memory
LimitMEMLOCK=infinity

# Disable timeout logic and wait until process is stopped
TimeoutStopSec=0

# SIGTERM signal is used to stop the Java process
KillSignal=SIGTERM

# Send the signal only to the JVM rather than its control group
KillMode=process

# Java process is never killed
SendSIGKILL=no

# When a JVM receives a SIGTERM signal it exits with code 143
SuccessExitStatus=143

# Allow a slow startup before the systemd notifier module kicks in to extend the timeout
TimeoutStartSec=75

[Install]
WantedBy=multi-user.target

Save and close the file, and then run the following commands to start and enable the new service:
```
# systemctl enable elasticsearch.service
```
```
# systemctl status elasticsearch.service
```
```
# systemctl start elasticsearch.service
```
Once the service is in place, Elasticsearch should be stopped and started via systemctl. For example, systemctl stop elasticsearch and systemctl start elasticsearch.

Once this Elasticsearch instance is configured and running, follow the instructions in Connecting to Elasticsearch to connect Anzo to this instance.