Compute Resource Planning
This section provides guidance on choosing the instance types for the nodes in your node pools.
Operator Nodes
The operator pods are very small. Each operator requires 0.5 CPU. The table below lists the recommended instance types and sizes for a single operator. If you plan to co-locate operators on a single instance, increase CPU accordingly. For example, an instance with 4 CPU can run up to 7 operators (3.5 CPU for operator pods and 0.5 CPU for the auxiliary service).
CSP | Suggested Instance Type | vCPU | RAM | Disk |
---|---|---|---|---|
AWS | m5.large | 2 | 8 GiB | 50 GB |
GCP | n1-standard-1 | 1 | 3.75 GiB | 50 GB |
Azure | Standard_D1_v2 | 1 | 3.5 GiB | 50 GB |
For Amazon EKS deployments, the Suggested Instance Type for Operator nodes is also recommended for nodes in the Common node group. The Common group runs K8s service pods, such as Cluster Autoscalers and Load Balancers, which are very small and require few resources.
AnzoGraph Nodes
Since AnzoGraph is a high-performance, in-memory database, RAM is generally the most critical resource to consider when determining the overall size and number of nodes to use for AnzoGraph environments. Consider the size of the data that you plan to load and then multiply that size by 3 or 4 to determine the total memory requirement. Query processing and intermediate results can temporarily consume a very large amount of memory. For more information about AnzoGraph sizing guidelines, see Sizing Guidelines for In-Memory Storage.
Also, unlike Anzo Unstructured, for example, where leader and worker pods can be colocated on the same node, Cambridge Semantics recommends that only one AnzoGraph pod is run per node. The table below shows a range of cloud instances to choose from that are ideal for running AnzoGraph pods.
CSP | Suggested Instance Range | vCPU Range | RAM Range | Disk |
---|---|---|---|---|
AWS | m5.4xlarge – m5.16xlarge | 8 – 64 | 32 GiB – 256 GiB | 100 GB |
GCP | n1-standard-8 – n1-standard-64 | 8 – 64 | 30 GiB – 240 GiB | 100 GB |
Azure | Dv2 and Dv3 series | 8 – 64 | 28 GiB – 256 GiB | 100 GB |
Dynamic Nodes
Nodes in the Dynamic node pool need to be sized to run Anzo Agent pods. An Anzo Agent is a scaled down version of the Anzo server that coordinates the sending of documents to the Anzo Unstructured (AU) worker nodes. Anzo Agent pods require more resources than AU leader and worker, Elasticsearch, and Spark pods. Each unstructured pipeline deploys a single Anzo Agent pod, and the pod needs to have enough resources to coordinate the pipeline. Anzo Agent pods are typically deployed as one pod per node, while the AU worker, Elasticsearch, and Spark nodes run multiple pods per node. The table below lists the recommended instance types and sizes for running the Anzo Agent pods. The recommended instances are also sufficient for running multiple AU, Elasticsearch, and Spark pods.
CSP | Suggested Instance Type | vCPU | RAM | Disk |
---|---|---|---|---|
AWS | m5.2xlarge | 8 | 32 GiB | 100 GB |
GCP | n1-standard-8 | 8 | 30 GiB | 100 GB |
Azure | Standard_D8_v3 | 8 | 32 GiB | 100 GB |
For instructions on setting up the K8s infrastructure, see Deploying the K8s Infrastructure.