FAQ

This topic provides answers to frequently asked questions and includes references to more detailed information. The questions are categorized by subject:

Container Images FAQ
Graph Lakehouse FAQ
SPARQL FAQ

Container Images FAQ

This section includes answers for questions related to Graph Lakehouse container images.

Why are there three Graph Lakehouse container images?

Why are there three Graph Lakehouse container images?

To offer versatility for different types of environments and deployment preferences, Altair provides three Graph Lakehouse container images. The list below describes each image and its purpose:

anzograph (all-in-one image): The all-in-one image (https://hub.docker.com/r/cambridgesemantics/anzograph) includes the front end (user interface) as well the back end (database) in one image.
anzograph-frontend (user interface): The front end image (https://hub.docker.com/r/cambridgesemantics/anzograph-frontend) includes the user interface only. Multiple users can deploy the front end locally and use it to access a central Graph Lakehouse database cluster.
anzograph-db (back end/database): The back end image (https://hub.docker.com/r/cambridgesemantics/anzograph-db) includes the database only. If you have existing client applications to use with Graph Lakehouse and do not need the front end, you can deploy the database by itself.

Graph Lakehouse FAQ

This section includes answers for questions related to Graph Lakehouse usage.

How do I determine what size cluster to deploy?
Are there best practices around performance benchmarking with Graph Lakehouse?
How do I deploy Graph Lakehouse SPARQL endpoints?
How do I use the SPARQL and RDF Graph Store endpoints?
How do I enable SPARQL HTTP protocol?
How do I reset the admin password?
How do I access the Graph Lakehouse file system with Docker?
How do I copy load files from the host to the Graph Lakehouse file system in Docker?
How do I customize a Helm-managed Graph Lakehouse deployment?
How do I enable database persistence?
What RDF load file types does Graph Lakehouse support?
How do I set up my load files to get the best load performance?
How do I get a list of all Graph Lakehouse functions?

How do I determine what size cluster to deploy?

For guidance on determining the number of instances to include in your cluster and choosing the most suitable instance type, see the best practice Sizing Guidelines for In-Memory Storage.

Are there best practices around performance benchmarking with Graph Lakehouse?

For best practices and in-depth information about benchmarking with Graph Lakehouse, see the AnzoGraph DB Benchmarking Guide.

How do I deploy Graph Lakehouse SPARQL endpoints?

Graph Lakehouse supports the standard W3C SPARQL 1.1 Protocol (SPARQL endpoint) and SPARQL 1.1 Graph Store HTTP Protocol (RDF Graph Store endpoint). The SPARQL endpoint and Graph Store endpoint are both enabled by default. And both endpoints can be accessed through the front end (user interface) or the back end (database).

If you have the front end client deployed, the endpoints are enabled and can be accessed by applications that have access to the front end server. User authentication is required to access endpoints through the front end.
Back end endpoints are also enabled by default but are controlled by the enable_sparql_protocol configuration setting. If SPARQL protocol is disabled for your deployment, the database endpoints will not be accessible. See How do I enable SPARQL HTTP protocol? for instructions on enabling SPARQL protocol. The back end endpoints do not support user authentication at this time.

For more information about the endpoints, see Access the SPARQL and RDF Endpoints.

How do I use the SPARQL and RDF Graph Store endpoints?

Graph Lakehouse endpoints conform to the W3C SPARQL 1.1 standards and can be accessed like other standard SPARQL endpoints. For usage information and details about the Graph Lakehouse endpoints, see Access the SPARQL and RDF Endpoints.

For W3C specifications on SPARQL endpoints, see SPARQL 1.1 Protocol. For RDF graph store specifications, see SPARQL 1.1 Graph Store HTTP Protocol.

How do I enable SPARQL HTTP protocol?

If HTTP protocol is disabled for your deployment and you want to enable it so that you can use the Graph Lakehouse CLI or post queries to the SPARQL HTTP port (7070 by default), follow the instructions below.

Stop Graph Lakehouse. You can stop the database from the Admin user interface, or see Start and Stop Graph Lakehouse for information about alternate methods.
Open the Graph Lakehouse settings file /install_path/config/settings.conf in an editor.
How do I access the Graph Lakehouse file system with Docker? For example:
```
vi /opt/anzograph/config/settings.conf
```
Find the enable_sparql_protocol setting. If the setting is listed, change the value from "false" to true. If the setting is not listed, add a new line to the end of the file and enter the following value:
```
enable_sparql_protocol=true
```
Save and close the file, and then restart Graph Lakehouse.

Once Graph Lakehouse starts, SPARQL HTTP protocol is enabled on the sparql_protocol_port (7070 by default) and you can use the CLI (AZGI) or send requests through the back end (database) SPARQL endpoints. For more information about Graph Lakehouse endpoints, see Access the SPARQL and RDF Endpoints.

How do I reset the admin password?

To reset the Graph Lakehouse admin password, SSH to the Graph Lakehouse host server (the leader node if this is a cluster) and run the following command. This command runs the azgpasswd utility in the anzograph/bin directory and updates the password (passwd) file in the anzograph/config directory:

./<install_path>/bin/azgpasswd /<install_path>/config/passwd -u admin -p <password>

For example, the following command resets the password to "Passw0rd1":

./opt/csi/anzograph/bin/azgpasswd /opt/csi/anzograph/config/passwd -u admin -p Passw0rd1

Do not use certain special characters like $ or * in passwords. Those characters have special meaning for bash.

How do I access the Graph Lakehouse file system with Docker?

Run the following Docker command to access the Graph Lakehouse file system, the /opt/anzograph directory:

sudo docker exec -it <container_name> /bin/bash

Where <container_name> is the name of the Graph Lakehouse container whose file system you want to access. For example:

sudo docker exec -it anzograph /bin/bash

How do I copy load files from the host to the Graph Lakehouse file system in Docker?

Run the following Docker command to access the Graph Lakehouse file system, the /opt/anzograph directory:
```
sudo docker exec -it <container_name> /bin/bash
```
Where <container_name> is the name of the Graph Lakehouse container whose file system you want to access. For example:
```
sudo docker exec -it anzograph /bin/bash
```
Determine where on the file system you would like to place the load files and create a new directory if necessary. If you plan to load a directory of files, remember to include the file type in the directory name. See RDF Load File Requirements for more information. For example:
```
mkdir /opt/anzograph/load-files.ttl
```
Type exit to exit the container.

Run the following Docker command to copy files from the host server to a location in the Graph Lakehouse container.

sudo docker cp /<path>/<filename> <container_name>:/<path>/<directory>

For example:

sudo docker cp /home/user/sales.ttl anzograph:/opt/anzograph/load-files.ttl/

Or this command copies a directory to the container:

sudo docker cp -r /<path>/<directory> <container_name>:/<path>

For example:

sudo docker cp -r /home/user/load-files.ttl anzograph:/opt/anzograph/

How do I customize a Helm-managed Graph Lakehouse deployment?

To customize a Helm-managed deployment, modify the Graph Lakehouse Helm chart, values.yaml, and then deploy Graph Lakehouse using that chart.

The values.yaml file is in the HELM_HOME directory. To view the location of HELM_HOME, you can run helm home.
To download the latest version of the Helm chart from csi-helm/anzograph, you can run helm repo update.
You can edit values.yaml directly or make a copy and edit the copy. When you run the helm install command to deploy Graph Lakehouse, specify the name of the Helm chart to use for that deployment.
For details about the Helm chart options, view the readme, Readme.md, in the HELM_HOME directory.

For instructions on deploying Graph Lakehouse with Helm, see Deploy Graph Lakehouse with Helm.

How do I enable database persistence?

For most installations Graph Lakehouse is configured by default to save the data in memory to disk after every transaction. Each time Graph Lakehouse is restarted, the persisted data is automatically loaded back into memory. To check whether the save to disk option is enabled, open the settings file, install_path/config/settings.conf, and find the enable_persistence option. If enable_persistence=true, data persistence is enabled. If enable_persistence=false, persistence is disabled. For instructions on changing settings, see Change System Settings.

Important Considerations

In general, each Graph Lakehouse server needs access to about twice as much disk space as RAM on the server. By default, Graph Lakehouse saves data to the install_path/persistence directory on the local file system. You can also configure Graph Lakehouse to save data to a different location by changing the value of the persistence_directory setting in settings.conf.
When persistence is enabled, transactional workloads that perform many concurrent write operations may experience a performance degradation due to the overhead of writing the data from each transaction to disk.

What RDF load file types does Graph Lakehouse support?

Turtle (.ttl file type): Terse RDF Triple Language that writes an RDF graph in compact form.
N-Triple (.n3 and .nt file types): A subset of Turtle known as simple triples.
N-Quad (.nq and .quads file types): N-Triples with a blank node or graph designation.
TriG (.trig file type): An extension of Turtle that supports representing a complete RDF data set.
JSON-LD (.jsonld file type): A method of encoding linked data using JSON. JSON-LD files are supported for loading via the IO services. JSON-LD is not supported by SPARQL LOAD queries.

For more information, see Load RDF Data from Files.

How do I set up my load files to get the best load performance?

When you have multiple files, Graph Lakehouse loads the files in parallel, using all available cores on all servers in the cluster. While you can load files stored on the leader node's local file system, for optimal performance, it is important to use a shared file system to ensure that all servers in the cluster have access to the files. In a Docker or Kubernetes container environment, the storage system should also be shared with the container file system.

For more information and details about load file requirements, see RDF Load File Requirements.

How do I get a list of all Graph Lakehouse functions?

You can run the following query to return a list of supported SPARQL functions. The query returns all of the function names as well as the supported argument and return types for each function:

SELECT ?extension_name ?extension_arguments ?extension_results
WHERE { TABLE 'stc_functions'}
ORDER BY ?extension_name

SPARQL FAQ

This section includes answers for questions related to the SPARQL query language.

What extensions to the SPARQL standard does Graph Lakehouse provide?
Where can I find more information about SPARQL?

What extensions to the SPARQL standard does Graph Lakehouse provide?

Graph Lakehouse implements the standard SPARQL forms and functions described in the W3C SPARQL 1.1 Query Language specification. In addition to supporting the standard functions, Graph Lakehouse also provides several SQL-like and Microsoft Excel-like functions as well as support for more advanced operations like window aggregates, advanced grouping sets, and graph algorithms. In addition to the built-in standard and advanced functions, Graph Lakehouse includes extension libraries that provide several data science, geospatial, Apache Arrow, and various utility functions. For information, see SPARQL Query Language Reference.

Where can I find more information about SPARQL?

For basic information about SPARQL, the semantic web, or RDF, see the Altair Semantic University. In addition, the Semantic Web for the Working Ontologist focuses on SPARQL and RDF usage using Internet examples.

To view the W3C formal specification and definitive reference, see the SPARQL 1.1 Query Language specification.

For additional information about best practices and tips, see SPARQL Best Practices and SPARQL Tips and Tricks.