Anzo Requirements
This page provides important guidelines to follow when choosing the hardware and software for Anzo host servers.
- Hardware Requirements
- Software Requirements
- Firewall Requirements
- File Storage Requirements
- Standalone Spark Server Requirements
Hardware Requirements
The following guidelines apply to individual Anzo servers within production and development environments. Your Cambridge Semantics Customer Success manager can help you identify an overall Anzo and AnzoGraph deployment configuration that is appropriate for your solution and use cases.
Production Environments
Component | Minimum | Recommended | Description |
---|---|---|---|
RAM | 64 GB | 128+ GB | The Anzo system data source is a disk-based graph store (called a Journal or Volume). When the system source is queried, Anzo swaps the data from disk to memory on demand. Choosing a host server with more RAM increases the performance of system queries because the OS can store the journal data in its file cache, avoiding the need for Anzo to swap data from disk to memory. In addition, RAM is required to hold intermediate results for join queries. |
Disk Space: Anzo Install Path |
100 GB | 500+ GB | The Anzo server installation disk needs to have enough space to store the Anzo system data source, Anzo log files, any plugins, and the Anzo client. In addition, if the local Sparkler compiler and Spark ETL engine are used on the Anzo server, consider that the disk size also needs to be sufficient for hosting all of the job-related .jar files. |
Disk Space: Shared File System |
500 GB | 1+ TB | The shared file system stores all of the RDF data and ETL files that are shared between Anzo and all AnzoGraph, Anzo Unstructured, Spark, and Elasticsearch servers. For more information, see File Storage Requirements below. |
vCPU | 16 | 32 | Once you provision sufficient RAM, performance depends on CPU capabilities. Keep in mind that you are provisioning for both a production database and a busy application server. A greater number of cores and high clock speed can make a dramatic difference in performance when there are many concurrent Anzo users. |
Architecture | 64-bit | 64-bit | Anzo is supported only on 64-bit architecture. |
Development Environments
Component | Minimum | Recommended | Description |
---|---|---|---|
RAM | 32 GB | 64+ GB | These RAM guidelines assume that the development environment is intended to host smaller data volumes than the production environment and support one or two Anzo users at a time. For development environments with large data volumes and multiple concurrent users, increase the RAM amount. |
Disk Space: Anzo Install Path |
100 GB | 500+ GB | The Anzo server installation disk needs to have enough space to store the Anzo system data source, Anzo log files, any plugins, and the Anzo client. In addition, if the local Sparkler compiler and Spark ETL engine are used on the Anzo server, consider that the disk size also needs to be sufficient for hosting all of the job-related .jar files. |
Disk Space: Shared File System |
500 GB | 1+ TB | Typically the development environment mounts the same shared file system as the production environment. |
vCPU | 8 | 16 | Like the RAM guidelines, these vCPU guidelines assume that the development environment is intended to host smaller data volumes than the production environment and support one or two Anzo users at a time. For development environments with large data volumes and multiple concurrent users, increase the number of vCPU. |
Architecture | 64-bit | 64-bit | Anzo is supported only on 64-bit architecture. |
Software Requirements
This section lists the software requirements for Anzo servers and client workstations. It also includes important service account information and lists the supported single sign-on providers.
Do not run any other software, including anti-virus software, on the same server as Anzo. Additional software may be run in a development environment with the expectation of lowered Anzo performance. Cambridge Semantics strongly recommends that you do not run additional software on the Anzo server in a production environment.
Component | Minimum | Recommended | Guidelines |
---|---|---|---|
Operating System (Anzo Server) |
RHEL/CentOS 6 | RHEL/CentOS 7.9 | Cambridge Semantics recommends that you tune the ulimits for your Linux distribution to increase the limits for certain resources. See Configure User Resource Limits for more information. |
Microsoft Excel (Client Workstation) |
Excel 2003 | Excel 2007+ | The Anzo for Office data integration mapping tool plugin requires Microsoft Excel. |
Web Browser (Client Workstation) |
Firefox 62+ Chrome 74+ Safari 12+ Chromium-Based |
Chrome 90+ | Use the latest versions of web browsers, especially if you are using a Chromium-based browser, as some older versions will not work with the Anzo user interface components. |
Enterprise-Level Anzo Service User Account | N/A | N/A | It is important to work with your IT organization to create an Anzo service user account at the enterprise level. The service user account needs to be associated with a central directory server (LDAP) so that it is available across Anzo environments and is managed in accordance with the permissions policies of your company. For more information, see Anzo Service Account Requirements below. |
Anzo Service Account Requirements
For consistent and appropriate access management across current and future Anzo environments, it is important for the IT organization to create an enterprise-level, LDAP-managed Anzo service user account. The service account should be used when installing and running Anzo and all of the components in the platform, such as AnzoGraph, Spark, Elasticsearch, and Anzo Unstructured clusters. The service account should not have root user privileges but does need the following access:
- The account must have read and write permissions for the Anzo component installation directories. The default Anzo server installation directory is
/opt/Anzo
. - The account must have read and write access to the shared file store, such as the NFS mount location, where all Anzo components will read and write files during the data onboarding processes. For more information about the shared file system requirements, see Deploying the Shared File System.
Set the Anzo account User ID (UID) and Group ID (GID) to 1000. For integration between Anzo applications, it is important that the owner of files that are written to the shared file store is UID 1000, especially if you are considering Kubernetes-based deployments of Anzo applications.
- The account must have a home directory on the Anzo host server.
Supported Single Sign-On Providers
Anzo supports the following single sign-on (SSO) protocols:
- Basic SSO
- Facebook OAuth
- JSON Web Tokens (JWT)
- Kerberos
- OpenID Connect (OIDC)
- Security Assertion Markup Language (SAML)
- Spring Security OAuth2
For information about configuring SSO access, see Connecting to an SSO Provider.
Firewall Requirements
The table below lists the TCP ports to open on the Anzo host.
Port | Description | Access Needed... |
---|---|---|
61616 | Anzo port used by the software development kit (SDK) and command line interface (CLI) |
|
61617 | Anzo SSL port used by the SDK and CLI |
|
8022 | Anzo SSH service port |
|
8945 | Anzo Administration service port |
|
8946 | Anzo Administration service SSL port |
|
80 | Application HTTP port |
|
443 | Application HTTPS port. |
|
3389 | LDAP port |
|
9393 (optional) | Optional Java Management Extensions (JMX) port. Enable this port if you want to connect to Anzo from a JMX client. |
|
9394 (optional) | Optional JMX SSL port. Enable this port if you want to make a secure connection to Anzo from a JMX client. |
|
5700 | The Anzo protocol (gRPC) port for secure communication between AnzoGraph and Anzo For more information about the communication between Anzo and AnzoGraph, see Firewall Requirements in AnzoGraph Server Requirements. |
|
5600 | AnzoGraph's SSL system management port |
|
File Storage Requirements
Anzo needs to have read and write access to a file storage system that can be shared between Anzo and all AnzoGraph, Anzo Unstructured, ETL Engine, and Elasticsearch servers. The supported storage systems are NFS, Hadoop Distributed File Systems (HDFS), File Transfer Protocol (FTP or FTPS) systems, Google Cloud Platform (GCP) storage, and Amazon Simple Cloud Storage Service (S3). In almost all cases, organizations create an NFS to mount to all of the servers in the Anzo environment. Mounted network file systems offer the best support and performance for reading and writing files.
For details and guidance on choosing the file system, see Deploying the Shared File System.
Standalone Spark Server Requirements
Anzo includes an embedded Spark ETL engine to integrate data from various sources. Depending on your server configuration, the embedded engine might not be sufficient for ingesting very large amounts of data. To support ingestion of large data sets, you can install standalone ingestion servers. The table below lists the recommended configuration for standalone Spark servers.
Component | Recommendation |
---|---|
Available RAM | 100+ GB |
Disk Space | 200+ GB |
vCPU | 16+ |