Validate Data via SHACL Shapes (Validate Data Step)
This topic provides guidance on configuring a Validate Data Step to check data consistency by running a SHACL shape validator on Graph Lakehouse. See https://www.w3.org/TR/shacl/ for details on the SHACL Shapes Constraint Language for validating RDF graphs against a set of conditions.
The Graph Studiosystem has a registry for shape graphs. Each of the graphs contain SHACL shapes. It is assumed that the registry contains SHACL graphs that have the RDF type "node shape".
This step runs SHACL validation against the shapes in the graphs that you specify in the step interface. Pick which shapes you want to validate your data against, then choose what to do if the validation fails: mark the data layer load as failed or mark the entire graphmart load as failed.
The SHACL validation report is written into the step's output, so you can see details of the failures, if any.
Follow the steps below to create a Validate Data Step.
- Go to the graphmart for which you want to add a step and then expand its Data Layers list.
-
In the Data Layers list, find the layer that you want to add the step to. Click the menu icon (
) for that layer and select Add Step/View. For example:
The Add Step/View dialog box is displayed with the New tab selected.
- To create a new Validate Data step, select Validate Data Step and then click OK. If you want to clone an existing step, click the Existing Steps tab, select the step that you want to clone, and then click OK. The Create Validate Data Step dialog opens. Specify a name for the step in the Title field and include an optional description in the Description field. Click Confirm. Graph Studio creates or clones the step and displays the Details tab:
- On the Details tab, configure the following options as needed:
- Title: The required name of the step.
- Description: An optional short description of the step.
- Enabled: When creating a new step, the Enabled option is selected by default, indicating that the step is enabled and will run when the layer is loaded or refreshed. If you want to disable the step so that it is not processed, clear the Enabled checkbox.
- Choose what to do in the case of validation failure:
- If the shacl validation fails, the layer will be marked as failed: Select this option if you want the validation failure to result in failed data layer load.
- If the shacl validation fails, the whole graphmart will be marked as failed: Select this option if you want the validation failure to result in failed graphmart load.
- The SHACL Shapes fields list the graphs containing SHACL shape data used to validate instance data using SHACL. NodeShapes objects are expected to live in a graph with the same URI as said shape. Choose a shape graph and select which of the shapes you want to run against.
If your query connects to a source that requires input of connection and authorization information, Altair recommends that you do not include the connection and authorization values directly in the query. Instead, replace those values with Context Variables from a Query Context. You can access Context Providers for each data source from the step's Query Context tab. For detailed information about query contexts and referencing variables in a query, see Using Query Contexts.
- Click Save to save the step configuration.
Once the Details tab is configured, the step can be run. For information about setting up an execution condition that uses this step, see Defining Execution Conditions.
After you run the step, click on the step name in the left pane to view the report. The Graph Lakehouse Load Status pop-up message opens. Click View Error to see the report in the Log Message window. If the validation succeeds, the report states that the data conforms to the specified SHACL shape. If it fails, there is a detailed report listing the reasons for the failure.