Requirements Overview

This topic highlights the principal requirements to be aware of when planning and provisioning for an environment that includes all of Anzo platform components.

Cambridge Semantics recommends that you create separate development, staging, potentially user acceptance testing, and production environments. Separating environments is essential for promoting organized development, safeguarding data, and minimizing disruptions in data processing solutions.

The diagram below shows a high-level overview of the platform requirements. The table below the image describes the elements that are pictured and includes references to the detailed requirements for each component.

Component Details
Anzo Network For security, deploy the platform component instances in the same network and set up firewall rules to allow connections only to trusted data sources and services. For details about the ports that need to be opened for inbound and outbound connections over the network, see Firewall Requirements in Anzo Server Requirements.
User Account For integration between components and appropriate ownership of installation directories and shared files, it is important to use the same service user account when installing and running all of the platform software. For security, the account should not have root privileges. If your platform will include container image deployments, such as with the dynamic Kubernetes infrastructure or single-node testing with another container engine, the user account must have the user and group ID set to 1000. For specifics about the account requirements, see Platform Service User Account Requirements.
AnzoGraph AnzoGraph is a massively parallel processing (MPP) graph OLAP engine. To provide the highest performance possible, AnzoGraph stores all data and performs all analytic operations entirely in memory, making RAM the most important resource to consider when provisioning the host server or servers. AnzoGraph can be installed on a single server or on multiple servers in a cluster. In a cluster, you designate one server as the leader server and that is the server that connects to Anzo and to Elasticsearch if it is included in your platform. When loading data from databases and HTTP endpoints, the AnzoGraph servers also need to be able to connect directly to those sources.

The connection to Anzo is made on ports 5700 and 5600. 5700 is the gRPC port for all user-initiated SPARQL requests, and 5600 is the system management port for system-level requests like stopping or starting AnzoGraph from Anzo. For additional details on AnzoGraph requirements, see AnzoGraph Requirements.

Elasticsearch Elasticsearch is a required component when using the Distributed Unstructured (DU) component. DU uses the Elasticsearch engine to build an index for each unstructured pipeline and for running text searches on the knowledge graph after it is created. Elasticsearch is optional for use without DU. It can also be used with structured sources to generate an index for data layers in graphmarts (see Creating an Elasticsearch Index from a Graphmart for more information). Elasticsearch connects to Anzo, the DU worker nodes, and the AnzoGraph leader node on ports 9200-9300. For more details on the Elasticsearch requirements, see Elasticsearch Requirements.
Distributed Unstructured The Distributed Unstructured (DU) component is required to be deployed in order to process and transform unstructured data. The DU cluster consists of one leader instance and one or more worker instances. The leader instance is a lightweight program that is typically installed on the Anzo host server. The worker instances require significantly more resources, CPU in particular, to process unstructured documents in parallel. Therefore, they are typically installed on dedicated servers. The worker instances communicate with the leader instance on port 2551 by default. For more details on the DU requirements, see Distributed Unstructured Requirements.
NFS Though Anzo can connect to and read files from various types of long-term storage systems, it is critical to deploy a file system that consistently offers good read and write support and can be shared by all of the components in the platform. Cambridge Semantics strongly recommends that you deploy an NFS and mount it in the same location on all component host servers. For details on the requirements, see Platform Shared File Storage Requirements.
Anzo Server The Anzo Server is the administrative layer that organizes and provides access control over all of the platform assets. It connects data sources and components and provides the Anzo application, Administration, and Hi-Res Analytics user interfaces as well as APIs and endpoints for accessing data from third-party applications. For more details about the requirements, see Anzo Server Requirements.