Graph Lakehouse Settings Reference
This topic provides reference information for each of the Graph Lakehouse system configuration settings. The configuration file, <install_path>/config/settings.conf
, categorizes the settings as either Basic or Advanced. The advanced-level settings should only be configured by system administrators or users with an advanced level of knowledge about Graph Lakehouse or databases in general. For instructions on changing settings, see Changing Graph Lakehouse Settings.
Basic Settings
This section describes the settings in the Basic section of settings.conf.
- enable_persistence
- enable_sparql_protocol
- enable_ssl_protocol
- internal_directory
- max_memory
- output_format
- persistence_directory
- sparql_protocol_port
- sparql_spec_default_graph
- spill_directory
- ssl_protocol_port
- startup_info
- stop_timeout
- truncate_clob
- use_custom_ssl_files
- user_queues
- xray_sth_portion
- xray_sth_spool_duration
- xray_sth_spool_maxgb
Setting | Default Value (type) | Description |
---|---|---|
enable_persistence | false (boolean) | Controls whether Graph Lakehouse saves a copy of the data in memory to disk. For more information, see Enabling Persistence (Preview). |
enable_sparql_protocol | false (boolean) | Controls whether to enable the HTTP SPARQL endpoint. The sparql_protocol_port setting controls the port to use to access the endpoint. Enabling the SPARQL HTTP protocol opens the standard SPARQL-compliant HTTP endpoint. Unlike the Graph Studio protocol endpoint, the SPARQL HTTP endpoint is not secured. |
enable_ssl_protocol | false (boolean) | Controls whether to enable the secure HTTPS SPARQL endpoint. The ssl_protocol_port setting controls the port to use. Enabling the SPARQL HTTPS protocol opens the standard SPARQL-compliant HTTPS endpoint. Unlike the Graph Studio protocol endpoint, the SPARQL HTTPS endpoint is encrypted but not authenticated. |
internal_directory | Not set (char) | The directory where Graph Lakehouse should save internal database-related files such as generated code, logs, and query plans. When not set, the default is <install_path>/internal . For more information, see Relocating AnzoGraph Directories. |
max_memory | System-based (int) | Specifies the amount of memory (in MB) to make available for Graph Lakehouse. The default is system-based; at startup, Graph Lakehouse determines the amount of RAM that is available and sets max_memory. In test environments where Graph Lakehouse may be co-located with other programs, you can set the max_memory value to put a limit on the amount of memory Graph Lakehouse can use. However, Cambridge Semantics recommends that you do not set max_memory unless instructed to do so by Support. |
output_format | xml (char) | Specifies the default output format for Graph Lakehouse responses. Valid values are xml, json, or csv. |
persistence_directory | Not set (char) | The directory where Graph Lakehouse should save data when enable_persistence is true and data is persisted to disk. When not set, the default is <install_path>/persistence . For more information, see Relocating AnzoGraph Directories. |
sparql_protocol_port | 7070 (int) | Specifies the port to use to access the SPARQL HTTP endpoint when enable_sparql_protocol is true. |
sparql_spec_default_graph | false (boolean) | Controls the default scope of SPARQL queries when FROM clauses are excluded from a query. When false, queries without FROM clauses target the default graph (DEFAULTSET) only. Triples in named graphs will not be included in the scope of the query. When true, Graph Lakehouse conforms to the SPARQL specification and includes the default graph and all named graphs in the scope of a query that omits the FROM clause. For more information, see Changing the Default FROM Clause Behavior. |
spill_directory | Not set (char) | The directory where Graph Lakehouse should save temporary query files that spill to disk. When not set, the default is <install_path>/spill . For more information, see Relocating AnzoGraph Directories.Graph Lakehouse uses O_DIRECT to read the spill files into the database. If you relocate the spill directory, make sure to place it on an ext4 file system that supports O_DIRECT. |
ssl_protocol_port | 8256 (int) | This setting specifies the port to use to access the SPARQL HTTPS endpoint when enable_ssl_protocol is true. |
startup_info | 1 (int) | Specifies how verbose the database startup message is: - 0-quiet, 1-ready, 2-ports, 3-more. |
stop_timeout | 30 (int) | When the database stop command is issued, this setting specifies the number of seconds to wait for queries to finish before stopping the database. |
truncate_clob | false (boolean) | Controls whether to automatically truncate large strings to the maximum string size (2 MB). |
use_custom_ssl_files | false (boolean) | Indicates whether you are replacing Graph Lakehouse's self-signed certificates with your own custom certificates. To configure Graph Lakehouse to use your certificates, follow the instructions in Replace the Default Self-Signed Certificates with Trusted Certificates. Anzo also needs to trust the new certificates. Make sure you have Trust All TLS Certificates enabled on the Graph Lakehouse connection or make sure Anzo's trust store has either the certificate for the CA that signed the certificate or the certificate itself. |
user_queues | 40 (int) | Sets the limit on the number of queries that can run concurrently. |
xray_sth_portion | 0.001 (float) | In 3.1 releases, this setting configures the percentage of total memory to use for storing historical system table information in memory before spilling to disk. The default value 0.001 = 0.1% of memory. |
xray_sth_spool_duration | 7days (char) | In 3.1 releases, this setting controls the length of time to accumulate historical system table information on disk for xrays. |
xray_sth_spool_maxgb | 20 (int) | In 3.1 releases, this setting controls the maximum size (in GB) per node of historical system table information to keep on disk for xrays. When the limit is reached, Graph Lakehouse deletes the oldest N records, where N depends on the server workload but is typically about 5 to 6 minutes worth of system table data. |
Advanced Settings
This section describes the settings in the Advanced section of settings.conf.
- anzo_protocol_port
- auto_restart_directory
- auto_restart_max_attempts
- auto_restart_time
- aws_log_level
- aws_search_regions
- azgmgrd_client_auth
- bits_per_pred_index
- bits_per_uri_index
- blank_node_name
- call_home_for_updates
- comm_port_base
- compile_concurrent
- compile_max_memory
- compile_max_seconds
- compile_optimized
- copy_file_size
- enable_owlstats
- enable_refresh_stats_on_update
- enable_root_user
- enable_unbound_variables
- float_decimals
- float_format
- grpc_token_expiry
- ignore_deniedlist_queries
- jvm_max_memory
- jvm_options
- log_directory
- policy_file_enabled
Setting | Default Value (type) | Description |
---|---|---|
anzo_protocol_port | 5700 (int) | The Graph Studio protocol (gRPC) port for secure communication between Graph Lakehouse and Graph Studio. |
auto_restart_directory | Not set (char) | Specifies the base location of the auto_restart directory, which contains the denied_list, warned_list, and unanalyzed_list directories. When not set, the default is <install_path>/internal . For more information about the auto-restart feature, see Managing Automatic Restarts. |
auto_restart_max_attempts | 5 (int) | Specifies the number of times the system manager should attempt to start the database after a crash. The default value is 5, which means the system manager will attempt to restart the database a maximum of 5 times. Changing auto_restart_max_attempts to 0 disables the auto-restart feature. For more information about the auto-restart feature, see Managing Automatic Restarts. |
auto_restart_time | 600 (int) | Specifies the number of seconds to spend attempting to restart the database. If all attempts fail and this time limit is reached, the system manager stops trying to restart the database. The default value is 600, which means that the system manager will attempt to restart the database for a maximum of 600 seconds (10 minutes). For more information about the auto-restart feature, see Managing Automatic Restarts. |
aws_log_level | 2 (int) | Graph Lakehouse uses an AWS C++ SDK for loading data from S3. This setting controls the logging level for the AWS SDK. The default value is 2, which is error level logging. Valid values are 0 (off), 1 (fatal), 2 (error), 3 (warn), 4 (info), 5 (debug), and 6 (trace). |
aws_search_regions | Not set (char) | Lists the regions to search for AWS S3 buckets that are listed as file locations for LOAD queries. |
azgmgrd_client_auth | false (boolean) | Controls whether the system management daemon (azgmgrd) and system manager (azgctl) use authentication in addition to encryption when connecting to other system managers over the system management gRPC port (5600). The default value is false, which means the system management connections are encrypted but not authenticated. |
bits_per_pred_index | 16 (int) | Specifies the maximum number of unique graph and predicate URIs that can be stored in Graph Lakehouse. The maximum number is two to the power of this value. The default value (16) for bits_per_pred_index is set to the maximum value and should not be changed. 2^16 = 64k unique predicate and graph URIs. |
bits_per_uri_index | 32 (int) | Specifies the maximum number of unique subject URIs that can be stored in Graph Lakehouse. The maximum number is two to the power of this value. The default value (32) for bits_per_uri_index is set to the maximum value and should not be changed. 2^32=4+ trillion unique subject URIs. |
blank_node_name | genid (char) | Specifies the default name basis for blank nodes. By default, Graph Lakehouse generates a number ID for the node. For example, inserting _:a results in a URI such as bnode:a__63 . |
call_home_for_updates | false (boolean) | Controls whether Graph Lakehouse checks for updates over the internet. |
comm_port_base | 9100 (int) | Specifies the port to use for internal cluster communication. |
compile_concurrent | 8 (int) | Specifies the maximum number of generated code compilations to perform concurrently. |
compile_max_memory | 500 (int) | Sets the limit on the amount of memory (in MB) that Graph Lakehouse can allocate for compiling generated code before switching from optimized compile to non-optimized compile. |
compile_max_seconds | 30 (int) | Sets the limit on the number of seconds to spend compiling generated code before switching from optimized compile to non-optimized compile. |
compile_optimized | background (char) | Specifies the type of optimized compile to perform. |
copy_file_size | 5 (int) | Controls the size (in MB) of the Turtle files that are generated when graphmart contents are exported to files. |
enable_owlstats | true (boolean) | In order to generate query execution plans, Graph Lakehouse needs to gather statistics about the data, such as the number of triples per graph and number of distinct subjects and predicates. This setting controls whether advanced statistics gathering, called OWL stats, is enabled. OWL stats use the metadata from data models to generate statistics. Cambridge Semantics recommends that you leave enable_owlstats enabled unless otherwise instructed. |
enable_refresh_stats_on_update | true (boolean) | Controls whether the statistics in Graph Lakehouse are flagged as outdated when a graph is updated. |
enable_root_user | false (boolean) | Controls whether to allow a user running with root privileges to start Graph Lakehouse. |
enable_unbound_variables | false (boolean) | Controls whether Graph Lakehouse returns an empty result or an error if a query references a missing graph or includes unbound variables. This value is set to false by default, which means Graph Lakehouse returns an error. For more information, see Ignoring Missing Graphs. |
float_decimals | 6 (int) | This setting does not apply to results that are returned from Graph Lakehouse to Graph Studio over gRPC protocol. Graph Studio converts floating point values to Java native float objects with 6 – 7 total digits of precision. This setting would only affect results that are returned directly from Graph Lakehouse to another application over HTTP/S protocol. Graph Lakehouse formats floating point types using the printf format string %.precision format, where precision is the value of the float_decimals, and format is the value of float_format. The interpretation of |
float_format | g (char) | This setting does not apply to results that are returned from Graph Lakehouse to Graph Studio over gRPC protocol. Graph Studio converts floating point values to Java native float objects with 6 – 7 total digits of precision. This setting would only affect results that are returned directly from Graph Lakehouse to another application over HTTP/S protocol. Graph Lakehouse formats floating point types using the printf format string %.precision format, where format is the value of the float_format, and precision is the value of float_decimals. Valid values for float_format are e, E, f, F, g, or G. In the default configuration, a value of 10000000000.123 is returned as 1e+10. |
grpc_token_expiry | 0 (int) | Controls how often (in seconds) the gRPC token expires. A value of 0 means the token never expires. |
ignore_deniedlist_queries | true (boolean) | Controls whether denied list queries are blocked from running or are allowed to be run when the database is returned to normal operation. The default value is true, which means denied list queries are ignored. Incoming queries are not compared with the denied list and are permitted to run. If ignore_deniedlist_queries is false, denied list queries are not ignored and are therefore blocked from running until they are removed from the denied list. For more information about the auto-restart feature, see Managing Automatic Restarts. |
jvm_max_memory | Not set (char) | Specifies the maximum size of the heap that can be used by the embedded Java virtual machine (JVM). This setting affects memory used for queries that employ Graph Lakehouse Java extensions, such as the Graph Data Interface. Use k, m, or g (case insensitive) for KiB, MiB, or GiB. You can also specify % to indicate a percentage of the total memory that is available to
Graph Lakehouse. By default, this value is not set, which means |
jvm_options | Not set (char) | Lists any optional parameters to use for configuring the embedded JVM. Use a semicolon-delimited (;) list to specify multiple parameters. For information about JVM options, see Options in the Java Documentation. |
log_directory | Not set (char) | Specifies where to write system management daemon (azgmgrd) log files. These types of logs (azgmgrd.log , azgctl-<user>.log , azgpidmgr.log , and azgpids.log ) are created before the system is initialized and may be written before the <install_path>/internal/log directory exists. Therefore, they are located outside of the Graph Lakehouse file system, /tmp by default. If you change the log_directory value, Cambridge Semantics recommends that you choose another location that is outside the internal Graph Lakehouse directories. |
policy_file_enabled | false (boolean) | Enables or disables file system access control policies. When policy_file_enabled is false (the default value), Graph Lakehouse does not perform file path access checks when a query reads or writes files or directories on the file system. When policy_file_enabled is true and a query attempts to access a file or directory on the file system, Graph Lakehouse performs the file path access checks that are configured in the file_policy_* settings and returns an access denied error message if the path is not accessible. For instructions on configuring file access policies and the file_policy_read, write, delete, and deny settings, see Managing Graph Lakehouse File Access Policies. |