Direct Load Advanced Settings Reference

This topic describes the Advanced options that are available when you create or configure the Direct Data Load workflow to load a data source via auto-generated data layers.

Graphmart Options

The Graphmart Options are available when you create a graphmart from a data source (as described in Creating a Graphmart from a Data Source). They are not available when you add a data source to an existing graphmart (as described in Adding a Data Source to an Existing Graphmart).

Export to Dataset

This setting controls whether the automated workflow generates a dataset by exporting the graphmart. This option does not create an Export Step in the graphmart but it does generate a dataset in the selected data store and adds the dataset to the Datasets catalog. If you leave Export to Dataset disabled, a dataset is not automatically generated. For information about creating an Export Step to export a graphmart to a dataset at a later time, see Export Data to an FLDS (Export Step). If you enable Export to Dataset, the following settings are displayed. The list below the image describes the settings.

Dataset Anzo Data Store: This required setting specifies the Graph Studio Data Store to export the dataset to. The data store must be a location on the shared file store that Graph Lakehouse has access to.
Post Dataset Export Action: This required setting specifies how to treat the graphmart after the Export Step is processed. To leave the graphmart online, select Leave Graphmart Activated (the default value). To retain the graphmart but disable it and remove the data from Graph Lakehouse, select Deactivate Graphmart. And to designate the graphmart as temporary and remove it after the dataset is exported, select Delete Graphmart.
Dataset Format: This setting specifies the file format for the RDF TTL files that are generated, i.e., whether they are compressed or not. The valid options are ttl for uncompressed and ttl.gz for compressed.

Find Connections

This optional setting specifies whether to find relationships between tables in the schema (or between data sources if multiple sources are selected). Finding connections is useful if the schema does not define primary and foreign key relationships and you want the Graph Data Interface to create the connections.

When Find Connections is enabled, two models are created, one that contains the classes and properties and one that contains only the connections. To view the complete model, both models must be added to the Working Set in the Model viewer.

Profile Data

This optional setting specifies whether to generate a Data Profile after the graphmart is activated. For information about the metrics that are run when a profile is generated, see Data Profiling Metrics.

Layer Generation Strategies

These settings control the strategy to use for auto-generating the data layer queries in the graphmart. Each strategy produces the same graph data but uses a different method for structuring the queries that produce the data. You can select both options if you want and review the resulting layers and steps. Each option results in a separate layer. However, only one of the resulting layers can be enabled by default.

Single Step

This is the default strategy. When Single Step is selected, a layer with a single Direct Load Step is created. The single query loads all tables from the selected schema or schemas. The generated query is an RDF and Ontology Generator query. See Onboarding Data with a Direct Load Step to learn more about the GDI RDF and Ontology Generator.

Multiple Steps

When Multiple Steps is selected, the layer has a separate Direct Load Step for each table in the selected schema or schemas. The generated query in each step is also an RDF and Ontology Generator query. With this strategy, you can enable and disable certain steps to control which tables are included in the graphmart.

Enabled Layer

This required setting specifies the data layer that should be enabled by default when the graphmart is activated.

Ontology URI

This optional setting specifies the custom URI to use for the model that is automatically generated. The value must be a valid URI without a hash (#) or slash (\) character at the end. If you do not specify a custom URI, the Graph Data Interface generates a URI in the following format:

http://cambridgesemantics.com/Layer/<layer_ID>/Model

Enable Partitioning

This option specifies whether to enable file partitions for file-based data sources. When file partitions are enabled, files will be partitioned and ingested in parallel for increased performance.

Multiline CSV files may fail to load when this option is enabled. If the following type of error message is returned when onboarding files, disable the Enable File Partitions setting:

File uses multiline records and cannot be segmented.
Please disable segmenting for this file.

Sampling Limit

This optional setting specifies the number of rows to scan before inferring the data types for each column.