Compute Resource Planning

This section provides guidance on choosing the instance types for the nodes in your node pools.

Operator Nodes

The operator pods are very small. Each operator requires 0.5 CPU. The table below lists the recommended instance types and sizes for a single operator. If you plan to co-locate operators on a single instance, increase CPU accordingly. For example, an instance with 4 CPU can run up to 7 operators (3.5 CPU for operator pods and 0.5 CPU for the auxiliary service).

CSP Suggested Instance Type vCPU RAM Disk
AWS m5.large 2 8 GiB 50 GB
GCP n1-standard-1 1 3.75 GiB 50 GB
Azure Standard_DS2_v2 2 7 GiB 50 GB

For Amazon EKS deployments, the Suggested Instance Type for Operator nodes is also recommended for nodes in the Common node group. The Common group runs K8s service pods, such as Cluster Autoscalers and Load Balancers, which are very small and require few resources.

AnzoGraph Nodes

Since AnzoGraph is a high-performance, in-memory database, RAM is generally the most critical resource to consider when determining the overall size and number of nodes to use for AnzoGraph environments. Consider the size of the data that you plan to load and then multiply that size by 3 or 4 to determine the total memory requirement. Query processing and intermediate results can temporarily consume a very large amount of memory. For more information about AnzoGraph sizing guidelines, see Sizing Guidelines for In-Memory Storage.

Also, unlike Anzo Unstructured, for example, where leader and worker pods can be colocated on the same node, Cambridge Semantics recommends that only one AnzoGraph pod is run per node. The table below shows a range of cloud instances to choose from that are ideal for running AnzoGraph pods.

CSP Suggested Instance Range vCPU Range RAM Range Disk
AWS m5.4xlarge – m5.16xlarge 8 – 64 32 GiB – 256 GiB 100 GB
GCP n1-standard-8 – n1-standard-64 8 – 64 30 GiB – 240 GiB 100 GB
Azure DSv2 and DSv3 series 8 – 64 28 GiB – 256 GiB 100 GB

Dynamic Nodes

Nodes in the Dynamic node pool need to be sized to run Anzo Agent pods. An Anzo Agent is a scaled down version of the Anzo server that coordinates the sending of documents to the Anzo Unstructured (AU) worker nodes. Anzo Agent pods require more resources than AU leader and worker, Elasticsearch, and Spark pods. Each unstructured pipeline deploys a single Anzo Agent pod, and the pod needs to have enough resources to coordinate the pipeline. Anzo Agent pods are typically deployed as one pod per node, while the AU worker, Elasticsearch, and Spark nodes run multiple pods per node. The table below lists the recommended instance types and sizes for running the Anzo Agent pods. The recommended instances are also sufficient for running multiple AU, Elasticsearch, and Spark pods.

CSP Suggested Instance Type vCPU RAM Disk
AWS m5.2xlarge 8 32 GiB 100 GB
GCP n1-standard-8 8 30 GiB 100 GB
Azure Standard_D8s_v3 8 32 GiB 100 GB

For instructions on setting up the K8s infrastructure, see Deploying the K8s Infrastructure.

Related Topics