Anzo Requirements
This page provides important guidelines to follow when choosing the hardware and software for servers that host Anzo.
- Hardware Requirements
- Software Requirements
- Firewall Requirements
- File Storage Requirements
- Standalone Ingestion Server Requirements
For information about Anzo Unstructured architecture requirements, see Anzo Unstructured Requirements and Recommendations.
Hardware Requirements
Cambridge Semantics lists above average production system hardware requirements as a guideline. These specifications are similar to what Cambridge Semantics currently provisions as the standard hosted environment. Larger production data sets running interactive queries may require significantly more powerful hardware and RAM configurations. Keep in mind that you are installing both a high performance graph database server as well as a fully featured application server. Provision production server hardware accordingly to avoid performance issues.
The table below provides a summary of the recommended hardware for production servers and the minimum requirements for test servers.
Component | Minimum | Recommended | Guidelines |
---|---|---|---|
Available RAM | 8 GB | 32 GB or more | Anzo needs enough RAM to map the database files into memory and run Anzo processes. If you produce large queries with joins across datasets, significant RAM is needed to hold intermediate results in memory. Note: The JVisual VM program included with Anzo enables you to determine whether a server is memory-bound. |
Disk space and type | 10 GB (Anzo Server) 100 GB (Data) |
100 GB (Anzo Server) 1+ TB (Data) |
See File Storage Requirements below. |
CPU | 4 core 2.2GHz | 8 core 3GHz+ | Once you provision sufficient RAM and a high-performing I/O subsystem, performance depends on raw CPU capabilities. Keep in mind that you are provisioning for both a production database and a busy application server. Always use multi-core CPUs. A greater number of cores and high clock speed can make a dramatic difference in the performance of interactive queries. |
Architecture | 64-bit | 64-bit | Cambridge Semantics only supports the 64-bit versions of the server for production use. |
Software Requirements
This section lists the software requirements for Anzo servers as well as user resource tuning recommendations and supported single sign-on providers.
Component | Minimum | Recommended | Guidelines |
---|---|---|---|
Operating System | RHEL/CentOS 6 Windows 2008 |
RHEL/CentOS 7 Windows 10 |
See Tuning User Resource Limitations (ulimits) below for information about setting ulimits on UNIX and Linux operating systems. |
Microsoft Excel | Excel 2003 | Excel 2007+ | |
Web Browser | Firefox 62+ Chrome 74+ Safari 12+ |
Chrome |
Tuning User Resource Limitations (ulimits)
Cambridge Semantics recommends that you tune the ulimits for your Linux distribution to increase the limits for certain resources. The list below describes the recommendations:
- Increase the open files limit to at least 4096.
- Increase the limit for the following resources to unlimited:
- cpu time
- file locks
- file size
- max memory size
- max user processes
- virtual memory
To view the current ulimits, run ulimit -a
. For example, the default ulimits for a CentOS 7.5 operating system are shown below:
core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 79607 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 4096 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited
To change the value for a resource, run the following command:
ulimit -option new_value
For example, the following command changes the open files value to unlimited:
ulimit -n unlimited
Supported Single Sign-On Providers
Anzo supports the following single sign-on (SSO) protocols:
- Basic SSO
- Facebook OAuth
- JSON Web Tokens (JWT)
- Kerberos
- OpenID Connect (OIDC)
- Security Assertion Markup Language (SAML)
- Spring Security OAuth2
Firewall Requirements
The table below lists the TCP ports to open on the Anzo host.
Port | Description | Access Needed... |
---|---|---|
61616 | Anzo port used by the software development kit (SDK) and command line interface (CLI) |
|
61617 | Anzo SSL port used by the SDK and CLI |
|
8022 | Anzo SSH service port |
|
8945 | Anzo Administration service port |
|
8946 | Anzo Administration service SSL port |
|
80 | Application HTTP port |
|
443 | Application HTTPS port. |
|
3389 | LDAP port |
|
9393 (optional) | Optional Java Management Extensions (JMX) port. Enable this port if you want to connect to Anzo from a JMX client. |
|
9394 (optional) | Optional JMX SSL port. Enable this port if you want to make a secure connection to Anzo from a JMX client. |
|
5700 | The Anzo protocol (gRPC) port for secure communication between AnzoGraph and Anzo For more information about the communication between Anzo and AnzoGraph, see Firewall Requirements in AnzoGraph Server Requirements. |
|
5600 | AnzoGraph's SSL system management port |
|
8100 | This port is used when Anzo loads many statements in parallel, such as when loading a large data model from Anzo to AnzoGraph |
|
File Storage Requirements
Anzo supports reading from and writing to storage systems such as a mounted NFS, Hadoop Distributed File Systems (HDFS), File Transfer Protocol (FTP or FTPS) systems, Google Cloud Platform (GCP) storage, and Amazon Simple Cloud Storage Service (S3).
Set up a storage system that is accessible by both Anzo and AnzoGraph. Depending on your infrastructure and use case, you might need to have enough storage space available for storing source data files, RDF load files, ETL job files, and other supporting files.
For more information about connecting to file storage, see Connecting to a File Store.
Standalone Ingestion Server Requirements
Anzo includes an embedded Spark ETL engine to integrate data from various sources. Depending on your server configuration, the embedded engine might not be sufficient for ingesting very large amounts of data. To support ingestion of large data sets, you can install standalone ingestion servers. This page lists the recommended configuration for standalone data ingestion servers.
Component | Recommendation |
---|---|
Available RAM | 100+ GB |
Disk Space | 200+ GB |
CPU | 16+ cores |