Creating an Elasticsearch Index from a Graphmart

By associating an Elasticsearch index with a data layer, you can load data from a graphmart to an Elasticsearch index, enabling you to perform free-text and pattern searches on your knowledge graphs. This topic provides instructions for configuring a workflow that generates an Elasticsearch index and snapshot from a graphmart.

Prerequisites

Before configuring a graphmart to create an Elasticsearch index, make sure that the following requirements are met:

  1. A supported version of Elasticsearch is installed and configured. For more information, see Elasticsearch Requirements.
  2. The Elasticsearch instance is connected to Anzo. See Connecting to Elasticsearch for more information.
  3. The AnzoGraph instance that you will load the graphmart to is also connected to Elasticsearch. Configure the connection by selecting the Elasticsearch instance in the Elasticsearch Configuration field in the AnzoGraph configuration. For more information, see Connecting to AnzoGraph.

Add an Export Step to the Graphmart

First, if a file-based linked data set (FLDS) has not been generated for this graphmart, add an Export Step to the last layer in the graphmart so that all of the graph data is exported to an FLDS on the file store. For instructions on adding an Export Step, see Export Data to an FLDS (Export Step).

If the graphmart has an Export Step, note the Target FLDS for the step and proceed to Add a Layer to Manage the Index below.

Add a Layer to Manage the Index

Follow the steps below to add a layer to a graphmart and configure it to manage an Elasticsearch index.

  1. Add a new data layer to the graphmart. This new layer will be associated with the Elasticsearch index and contain steps to generate the index and snapshot. For instructions on adding a layer, see Creating a New Layer.
  2. In the new layer, click the Elasticsearch Indexing tab. Then select the Associate an Elasticsearch index with this layer checkbox. When this setting is enabled, Anzo creates an index on the Elasticsearch server and links this layer to that index.

    Though the index will be populated with data from other layers in the graphmart, you do not need to modify those layers to associate them with the index. Associate an Elasticsearch index with this layer should be disabled on all layers that do not contain Elasticsearch Indexing Steps.

  3. The Advanced settings on this screen relate to managing the index's life cycle. It is not necessary to modify the settings to successfully generate an index for the graphmart. If you have an advanced use case, such as a case that requires linking this layer to an existing index or needing to add custom Elasticsearch-specific index or mapping configurations, expand Advanced to access the options. The list below describes the available settings.

    • Elasticsearch Index Name: By default, the new index is given a system-generated name. If you are linking this layer to an existing index, add the existing index name to this field. Or, if you plan to reference this index elsewhere and want to give it a human-readable name, you can specify a custom name.
    • Clear Elasticsearch index before this layer executes: This option is enabled by default and configures the layer so that the index is cleared and recreated each time this layer is run.
    • Delete Elasticsearch index on layer unload: This option is enabled by default and configures the layer so that the index is deleted any time the layer is deactivated or reloaded. The index is recreated during the reload or when the graphmart is activated again. If you do not want the index to be deleted when the layer is offline, clear the checkbox.
    • Elasticsearch Index Settings: This field can be used to add any Elasticsearch-specific index settings that you want to apply. Add the settings in the following JSON format:
      {
        "index": {
          "<settings_and_values>"
        }
      }

      For example:

      {
       "index": {
         "number_of_shards": "1",
         "number_of_replicas": "0",
         "routing": {
          "allocation": {
           "include": {
            "_tier_preference": null
           }
          }
         }
        }
      }
    • Elasticsearch Index Mapping: This field can be used to add any Elasticsearch-specific mapping properties that you want to apply. Add the properties in the following JSON format:
      {
        "properties": {
          "<properties_and_values>"
        }
      }

      For example:

      {
       "properties": {
        "movie_Abstract": { "type": "keyword" },
        "movie_ID": { "type": "long" },
        "movie_Title": { "type": "text" }
       }
      }
  4. When you have finished configuring the layer, click Save to add it to the graphmart and return to the Data Layers screen. Then proceed to Add a Step to Create the Index below.

Add a Step to Create the Index

Follow the steps below to add an Elasticsearch Indexing Step to the new layer and configure it to generate an index.

  1. On the Data Layers screen, click the menu icon () for the new layer and select Add Step/View.
  2. On the Add Step/View screen, select Elasticsearch Indexing Step and click OK. The Create dialog box is displayed.
  3. On the Details tab, add a name for the step in the Title field, and configure any optional settings. For details about the settings, see Create an Elasticsearch Index (Elasticsearch Indexing Step).

  4. When you have finished configuring the Details tab, click the Query tab. The tab includes a template for writing a SPARQL SELECT query that incorporates the Graph Data Interface (GDI) service to generate an index on the Elasticsearch server.

    Unlike queries for other steps, this step does not run an INSERT query because the data is not being inserted into AnzoGraph. Edit the template as needed. You can click the Preview in Query Builder button to open the query in the Query Builder, where you can perform practice runs to see results without having to refresh the layer. For more information about writing GDI queries against an Elasticsearch source, see Querying an Elasticsearch Source.

    Do not include Elasticsearch connection or index parameters in the query. Anzo automatically populates that information from the AnzoGraph configuration when the step is run.

  5. When you have completed the indexing query, click Save to save the configuration and add the step to the layer.
  6. Next, users typically add an Elasticsearch Snapshot Step to the same layer. The step takes a snapshot of the index and saves it to the dataset (FLDS) on disk. Storing the snapshot with the FLDS ensures that the index is included if the dataset is added to another graphmart. If you want to create a snapshot, continue to Add a Step to Take a Snapshot of the Index below.

Add a Step to Take a Snapshot of the Index

Follow the steps below to add an Elasticsearch Snapshot Step to the new layer and configure it to save a snapshot of the index to the FLDS.

  1. On the Data Layers screen, click the menu icon () for the new layer and select Add Step/View.
  2. On the Add Step/View screen, select Elasticsearch Snapshot Step and click OK. The Create dialog box is displayed.

  3. On the Details tab, add a name for the step in the Title field.
  4. Click the Target FLDS field and select the FLDS that is created by the Export Step in the graphmart.
  5. Configure any other optional settings on the Details tab. For information about the settings, see Take a Snapshot of an Index (Elasticsearch Snapshot Step).
  6. Click Save to save the configuration and add the step to the layer.

Once you have finished configuring the new layer and steps, reload the graphmart to generate (or update) the FLDS, create the Elasticsearch index, and save a snapshot of the index to the FLDS.