Graphmart Concepts

Graphmarts are containers for knowledge graphs. When you construct a graphmart, you are defining the set of instructions to use for transforming your source data into the desired knowledge graph.

Graphmart Modes

To streamline the graphmart creation process, there are two modes of operation, aside from the offline (deactivated) state. The first mode and the default setting for all newly created graphmarts is design mode, in which the graphmart is activated and editable. In design mode, a sample subset of data (up to 5,000 records) is loaded for each underlying source. Once graphmart design is complete, users can set the mode to deployed mode, in which the graphmart is activated and read-only. When a graphmart is in the deployed mode, the entire dataset is deployed. This two-step approach to creating graphmarts enables users to perform transformations and verify them on a subset of data before deploying them on the full dataset, reducing development time and enhancing efficiency.

Putting a graphmart in the offline mode (deactivated state) makes it read-only and turns off the ability to change its configuration.

For more information, see Creating a Graphmart.

Data Layers

Graphmarts are made up of data layers. There are two types of layers: source layers that connect to your data sources, and transformation layers for managing and transforming data.

Data layers are containers for steps. The steps in a layer perform all of the data operations. They are used to load additional datasets, infer new data automatically, or run SPARQL queries to create views or clean, conform, transform, or validate the data in the layer.

The following list introduces the key features of data layers:

  • You can create any number of layers in a graphmart and control which users and groups have access to the layers and the data within the layers.
  • Users can toggle data layers on and off in Hi-Res Analytics dashboards.
  • You can configure execution conditions on layers so that the layer is processed only when certain conditions are met.
  • You can clone layers and include a copy in multiple graphmarts.
  • You control the source data for steps in a layer. Layers can build upon the data generated by steps in previous layers or can be self-contained, applying changes only to the data defined in the layer.
  • Each layer is stored as one named graph in Graph Lakehouse.

For instructions, see Adding Data Layers to Graphmarts.

Steps

You can add any number of steps to a layer and can create the following types of steps:

  • Context Attribute Step: Enables users to define and run a SELECT query against the graphmart, Graph Lakehouse, or another data source, a system journal or a semantic service, so that the named variables from the first row of the query result get inserted as context attributes into the graphmart.
  • Direct Load Step: Runs a SPARQL query that invokes the Graph Data Interface (GDI) RDF and Ontology Generators to directly load a data source to a graphmart.
  • Elasticsearch Indexing Step: Creates an Elasticsearch index to associate with a layer.
  • Elasticsearch Snapshot Step: Creates an Elasticsearch snapshot of the index that is associated with a layer.
  • Export Step: Exports the graphmart data in memory to a file-based linked data set (FLDS).
  • Load Dataset Step: Loads a dataset from the Datasets catalog into a data layer graph. When you create a graphmart from the catalog, Graph Studio automatically creates a layer with a Load Dataset Step.
  • Pre-compile Query Step: Runs the included query immediately after a graphmart is loaded so that the query is pre-compiled by Graph Lakehouse. Pre-compiling a query reduces execution time when a user runs that query for the first time.
  • Query Driven Templated Step: Enables users to create reusable query-driven templates for quickly creating additional query steps. Unlike the Templated Step, where users define each key-value pair, this step runs a query to identify all of the key-value pairs. Then the template query is run for each key-value solution from the first query.
  • Query Step: Provides a SPARQL query template that you can use for writing a query that creates, cleans, conforms, or transforms data in the data layer.
  • RDFS+ Inference Step: Uses RDFS and OWL rules to generate new data in a layer based on the vocabularies in the existing data.
  • Templated Step: Enables users to create reusable templates for quickly creating additional query steps. The query in a Templated Step uses parameters to represent key-value pairs. When reusing the step, users modify the values for the keys rather than rewriting the query.
  • Validation Step: Enables users to write a query that validates the data in a data layer and is a required component when setting up an execution condition.
  • Validate Data Step: Check data consistency by running a SHACL shape validator on Graph Lakehouse.
  • View: Enables users to write a SPARQL CONSTRUCT query that defines a view of the data but does not alter the source data or create new data unless you choose to materialize the view.
  • Write Back Step: Enables users to write an update query against Graph Lakehouse, but use a different datasource as the target of the update, writing information back into another datasource or activation metrics graphs, or calling a semantic service with data from the query.

For more information about graphmarts, layers, and steps, see Working with Graphmarts.