Configuring an Export Step

This topic provides guidance on configuring an Export Step to use for exporting the knowledge graphs in memory to a file-based linked data set (FLDS) on the file store. The sections below describe each of the tabs and configuration options that are available when you create or edit an Export Step.

If you add an Export Step to a Graphmart that has been activated, you must reload the entire Graphmart after adding the step. Simply refreshing the Data Layer after adding the step does not create the ontology graph that the Export Step requires.

Details

The Details tab includes all of the required settings that configure options such as the name of the step, the source data to act upon, and the target Dataset.

Title

The required name of the Step.

Description

And optional description of the Step.

Enabled

When creating a new Step, the Enabled option is selected by default, indicating that the Step is enabled and will run when the Data Layer is loaded or refreshed. If you want to disable the Step so that it is not processed, clear the Enabled checkbox.

The required name of the step.

Source

The Source is the source data that this Step should act upon. Steps can build upon the data generated by Steps in other Data Layers or can be self-contained, applying changes that relate only to the data defined in the Layer that contains this Step. You can select any number of the following options:

  • Self: This option is selected by default and means that Anzo exports only the data that is generated in the Layer to which this Step belongs.
  • All Previous Layers Within Graphmart: Choosing this option means that Anzo exports data that is generated by all of the Data Layers in the Graphmart that precede this layer.
  • Previous Layer Within Graphmart: Choosing this option means that Anzo exports only the data that is generated by the one Layer that precedes this Layer.
  • Layer Name: The Source drop-down list also includes options for specific Layer names. You can choose a specific Layer to export only the data that is generated by that Layer.

Data Models

This optional field specifies the Data Model or Models to include in the export. If you do not select a Model, Anzo automatically exports any associated Models when the data is exported.

Target FLDS

This is the target Dataset for this export. If a Dataset does not exist, you can select -Create New- to create an empty Dataset. See Adding an Empty Dataset for an Export Step for instructions. If you select an existing target Dataset, you also have the option to specify whether or not to overwrite the existing Dataset.

Overwrite FLDS

This setting controls whether the existing FLDS is replaced with the exported files or whether the exported files are added to the existing FLDS:

  • If you want Anzo to replace the current edition of the Dataset, select the Overwrite FLDS checkbox. When Overwrite FLDS is enabled, Anzo archives the existing files in a new timestamped export subdirectory directory under the Target FLDS directory. Each time the Export step runs, Anzo archives the current edition, and creates a new export directory. If you add this Dataset to a Graphmart, only the latest version of the exported data will be loaded to AnzoGraph.
  • If you want Anzo to add the exported files to the existing FLDS, leave the Overwrite FLDS checkbox unchecked. When Overwrite FLDS is disabled, Anzo adds all of the exported components to a cumulative export directory under the Target FLDS directory. The Dataset will contain the original files as well as all cumulative working editions. If you subsequently add this Dataset to a Graphmart, all of the data from all of the subdirectories will be loaded into AnzoGraph.

Export Binary Store Contents

This option applies to exports of Unstructured Graphmarts and controls whether the binary store is exported along with the data.

Always Move Binary Store

This option also applies to exports of Unstructured Graphmarts and controls whether the binary store is moved or copied during the export. Since the binary store can be large and have a nested structure, copying the data can take a very long time. Since moving the binary store is almost instantaneous, however, enabling Always Move Binary Store can reduce the time it takes to complete the export.

  • If you want Anzo to copy the binary store to the Target FLDS, leave Always Move Binary Store disabled (unchecked).
  • If you want Anzo to move the binary store to the Target FLDS, select the Always Move Binary Store checkbox to enable it.

Keep Elasticsearch Index Online

This option applies to exports of Unstructured Graphmarts and controls whether the Elasticsearch index that is associated with the Dataset remains stored in Elasticsearch or is removed from Elasticsearch once it is exported.

Generate Metrics

This option controls whether a Data Profile is generated before the data is exported. Since the data must be loaded in AnzoGraph to compute the metrics, you have the option to generate them during the export. If you load the exported files in the future, the Data Profile is also loaded.

Elastic Search

The Elastic Search tab contains optional settings that you can use to set any desired limits on Elasticsearch indexing processes.

Elasticsearch Index Settings

A custom list of any Elasticsearch index settings that you want to apply to the export.

Elasticsearch Bulk Size

The maximum batch size in MB.

Elasticsearch Bulk Actions

The maximum number of documents to include in each batch.

Elasticsearch Bulk Concurrent Requests

The maximum number of bulk requests that can run concurrently.

Elasticsearch Bulk Max Threads Per FLDS

The maximum number of threads to use for indexing per file-backed linked data set (FLDS).

Elasticsearch Bulk Max FLDS Threads

The maximum number of FLDSes to index concurrently.

Execution Condition

If you want this Step to be executed conditionally, based on the result of a specified Validation Condition, you can configure an Execution Condition on the Execution Condition tab that is available when creating or editing a Step. The image below shows the Execution Condition tab.

In order to set up an Execution Condition, the Graphmart needs to have at least one Validation Step that defines a Condition Variable. Condition Variables can be used across all Data Layers in the Graphmart. For guidance on configuring a Validation Step, see Configuring a Validation Step.

Enable Layer Based on Boolean Condition

This setting indicates whether to enable this Step only if the returned value from the Validation Condition is either true or false. You specify true or false in the Conditional Variable If Result field. If the Validation Condition fails, the Step is disabled.

Conditional Variable

This field specifies the variable that you want to base this Execution Condition on. The variable is the result of a Validation Step Query in the Graphmart.

Conditional Variable If Result

If you enabled the Enable Layer Based on Boolean Condition setting, select true or false from the drop-down list. The Step will be enabled only if the result of the Validation Step Query matches the value that you specified. If Enable Layer Based on Boolean Condition is disabled, leave this field blank.

If you added the Export Step to an active Graphmart, make sure that you reload the entire Graphmart.

Related Topics