Platform Shared File Storage Requirements

The Graph Studio Server and all other platform host servers need to have read and write access to a file storage system for sharing files. Though users can connect to and import files from various types of long-term storage systems, such as Hadoop Distributed File Systems (HDFS), File Transfer Protocol (FTP/S) systems, Google Cloud Platform (GCP) storage, Azure Cloud Storage, and Amazon Simple Cloud Storage Service (S3), those systems may not offer POSIX support or offer fast file transfer performance.

For the best read and write performance and to ensure seamless interoperability between Graph Studio components, deploy a Network File System (NFS) and mount it in the same location on all of the host servers in the platform. Mounted network file systems offer the best support and performance for reading and writing files.

If you plan to set up Kubernetes (K8s) integration for dynamic deployments of components, an NFS is required. Other file and object stores are not supported for K8s deployments.

Though users can connect to source files that are stored on systems such as Hadoop Distributed File Systems (HDFS), File Transfer Protocol (FTP or FTPS) systems, Google Cloud Platform (GCP) storage, and Amazon Simple Cloud Storage Service (S3), Graph Studio uploads a copy of those files to a graphmart staging area on the configured NFS so that the files can be used by all of the platform components. The NFS Guidelines section below describes the NFS requirements.

NFS Guidelines

This section describes the key recommendations to follow when creating an NFS for the Graph Studio platform:

  • Use NFS Version 4 or later.
  • Provision SSD disk types for the best performance.
  • For integration between components and appropriate file ownership, it is important to create the NFS with the same service user account as the other components. For more information, see Platform Service User Account Requirements.
  • When determining the size of the NFS, consider your workload and use cases. There needs to be enough storage space available for any source data files, exported RDF files, Elasticsearch indexes, and any other files that you plan to store on the NFS.

    Cloud-based NFS servers often have better performance if you over-provision resources. When using a cloud-based VM for the NFS, it may be beneficial to provision more CPU, disk space, and RAM than required.