Elasticsearch Requirements
Distributed Unstructured uses the Elasticsearch engine to build an index for each unstructured pipeline and for running searches on the data after it is onboarded. When choosing an Elasticsearch host server, consider the following information:
- Generating the index is a lightweight operation compared to document search operations. If you have a light unstructured data workload and do not perform text searches on large amounts of data, installing an Elasticsearch engine on the Anzo host server might be sufficient.
- If you onboard a large number of unstructured documents and plan to perform text searches across a large amount of data, Cambridge Semantics recommends that you install Elasticsearch on a dedicated server.
The table below list the Elasticsearch server requirements and recommendations:
Component | Recommendation | Description |
---|---|---|
Elasticsearch Version | Version 7.17.13 | Versions 7.10.2 – 7.17.13 are supported. |
Java | Java 11 | Elasticsearch requires Java 11 or later. The installation package includes an embedded JDK. |
CPU | 8+ cores | For performance, Elasticsearch benefits from multiple CPU cores for parallel processing. |
RAM | 64+ GB | Elasticsearch relies heavily on Java heap space for storing and processing data. |
Disk Space | 100+ GB | Elasticsearch creates temporary files and logs on the local disk when processing data. |
Ports | 9200-9300 | By default, the port range for Elasticsearch requests (http.port ) is 9200-9300. If port 9200 is not available when Elasticsearch is started, Elasticsearch tries 9201 and so on until it finds an accessible port. The Anzo server and AnzoGraph leader server need to be able to access Elasticsearch on the HTTP request port that Elasticsearch uses. |
Shared File System | Mounted NFS | The shared file system must be accessible from each Elasticsearch host server. See Platform Shared File Storage Requirements. |
Service User Account | Enterprise-level account | It is important to install and run Elasticsearch (and all other platform components) as the same service user. See Platform Service User Account Requirements. |
For instructions on installing Elasticsearch, see Installing Elasticsearch.