Deploying the Shared File System
Anzo and all of its platform applications must be able to access files on a shared file system. Anzo, AnzoGraph, Anzo Unstructured, Spark, and Elasticsearch servers need to share storage so that they can read and/or write the source data ingestion files, RDF load files, ETL job files, Elasticsearch indexes, and other supporting files.
While Anzo supports file connections to Network File Systems (NFS), Hadoop Distributed File Systems (HDFS), File Transfer Protocol (FTP or FTPS) systems, Google Cloud Platform (GCP) storage, and Amazon Simple Cloud Storage Service (S3), some object stores, like Amazon S3, are sufficient for long-term storage but do not offer POSIX support. Other storage systems, such as FTP, often have poor file transfer performance.
For the best read and write performance, Cambridge Semantics strongly recommends that you deploy an NFS and then mount it to each of the AnzoGraph, Anzo Unstructured, Elasticsearch, and Spark servers that make up the Anzo platform.
If you plan to set up Kubernetes (K8s) integration for dynamic deployments of Anzo components, an NFS is required. Other file and object stores are not supported for K8s deployments at this time.
NFS Guidelines
This section describes the key recommendations to follow when creating an NFS for the Anzo platform:
- Use NFS Version 4 or later.
- Provision SSD disk types for the best performance.
- When determining the size of the NFS, consider your workload and use cases. There needs to be enough storage space available for any source data files, ETL job files, generated RDF data files, Elasticsearch indexes, and any other files that you plan to store on the NFS. In addition, consider that cloud-based NFS servers often have better performance if you over-provision resources. When using a cloud-based VM for your NFS, it can be beneficial to provision more CPU, disk space, and RAM than required to store your artifacts.
- For integration between Anzo applications, the Anzo service account must have read and write access to the NFS. In addition, it is important to set the Anzo account User ID (UID) and Group ID (GID) to 1000 so that the owner of files that are written to the shared file store is UID 1000. For more information about the user account requirements, see Anzo Service Account Requirements.
If you are unable to map the Anzo service account UID and GID to 1000, you can modify anonuid and anongid in the NFS server export table to map all requests to 1000. To do so, add the following line to
/etc/exports
on the NFS server:<mount_point> *(insecure,rw,sync,no_root_squash) x.x.x.x(rw,all_squash,anonuid=1000,anongid=1000
For example:
/global/nfs/data *(insecure,rw,sync,no_root_squash) x.x.x.x(rw,all_squash,anonuid=1000,anongid=1000)