Anzo Unstructured Requirements and Recommendations
The Anzo Unstructured (AU) infrastructure is highly customizable and scalable. The number, size, and configuration of the servers in the environment depends on your unstructured data size, pipeline workload, and performance expectations. This topic provides guidance on determining the infrastructure to deploy as well as the requirements for each of the AU components. For an introduction to the AU architecture and pipeline process, see Anzo Unstructured Architecture and Process Overview.
AU requires two programs that are installed separately from Anzo:
- An Anzo Unstructured cluster for processing the incoming data. See Anzo Unstructured Cluster Requirements and Recommendations.
- Elasticsearch for indexing and searching unstructured document contents. See Elasticsearch Requirements and Recommendations.
Anzo Unstructured Cluster Requirements and Recommendations
An Anzo Unstructured (AU) cluster consists of one leader instance and one or more worker instances. Cambridge Semantics provides an installation script for installing the AU software. In an AU cluster:
- The leader instance is a lightweight program and is typically installed on the Anzo host server.
- The worker instances require significant resources to process the unstructured documents and are typically installed on dedicated servers.
Consider the size of your unstructured data workload when deploying worker host servers. Each worker instance can have multiple server instances to process documents. The table below lists the requirements for Anzo Unstructured worker servers:
Component | Requirement |
---|---|
Operating System | RHEL/CentOS 7+ |
CPU | 4+ cores |
RAM | 16+ GB |
Disk Space | 10+ GB |
File System | The Anzo file store (shared file system) must be accessible from each AU server in the cluster. For more information about file stores, see Connecting to a File Store. |
For instructions on installing Anzo Unstructured, see Deploying an Anzo Unstructured Cluster.
Elasticsearch Requirements and Recommendations
Anzo Unstructured uses the Elasticsearch engine to build an index after an unstructured pipeline runs and for running searches on unstructured data that is onboarded to Anzo. When choosing an Elasticsearch host server, consider the following information:
- Generating the index is a lightweight operation compared to document search operations. If you have a light unstructured data workload and do not perform text searches on large amounts of data, installing an Elasticsearch engine on the Anzo host server might be sufficient.
- If you onboard a large number of unstructured documents and plan to perform text searches across a large amount of data, Cambridge Semantics recommends that you install Elasticsearch on a dedicated server.
The table below list the Elasticsearch server requirements:
Component | Requirement |
---|---|
Elasticsearch Version | 7.1.1 |
CPU | 8+ cores |
RAM | 64+ GB |
Disk Space | 100+ GB |
File System | The Anzo file store (shared file system) must be accessible from each Elasticsearch server. For more information about file stores, see Connecting to a File Store. |
For instructions on installing Elasticsearch, see Installing and Configuring Elasticsearch.