Defining a Database Schema

The schema defines the source data to onboard. There are multiple options available for defining a database schema. You can import a predefined schema from the database, you can write a static SQL query that defines the data to onboard, or, if you want to import data incrementally, you can write an incremental SQL query that ingests a subset of the data.

You can import or create up to 5 schemas per database data source. To include more than 5 schemas, create another data source for the additional schemas.

By default, Anzo is configured to exclude Views from the list of available Schemas to import. For information about including Views as tables that can be imported, see Including Views as Database Schemas.

Import a Predefined Schema

  1. In the Anzo application, expand the Onboard menu and click Structured Data. Anzo displays the Data Sources screen, which lists any existing data sources. For example:

  2. Click the database for which you want to import a schema. Anzo displays the Tables tab for the source. For example, the image below shows the Tables tab before any schemas have been added:

  3. Click the Import Schemas button. Anzo displays the Import Schemas dialog box, which lists any predefined schemas in the database. For example:

    If you do not see a schema that you expect to see, make sure that you have the appropriate permissions to access to the data source.

  4. To import entire schemas, select the checkbox next to each schema name that you want to import. If you want to import a subset of the tables in a schema, expand the schema and select the checkbox next to each table that you want to include. For example:

  5. When you have finished selecting schemas, click OK. Anzo imports the selected schemas and lists them on the Tables tab. You can expand a schema to view its tables. Selecting a row in the schema displays the sample data on the right side of the screen.

    The automated data load workflow ignores all changes that are made to the schema on the Tables screen—except for changes to primary and foreign keys. For example, if you edit a column heading to change its semantic type, that change is disregarded when the graphmart is created. Only the original type from the data source is considered. If you add or change primary and foreign keys on the Tables screen, however, the automated data load workflow will retain those changes.

For information about creating or changing primary keys and foreign keys, see Assigning Primary and Foreign Keys in a Schema.

Create a Schema from an SQL Query

  1. In the Anzo application, expand the Onboard menu and click Structured Data. Anzo displays the Data Sources screen, which lists any existing data sources. For example:

  2. Click the database for which you want to create a schema. Anzo displays the Tables tab for the source. For example, the image below shows the Tables tab before any schemas have been created:

  3. Click the Create Schemas From Query button. Anzo displays the Create Schemas dialog box:

  4. In the Create Schemas dialog box, specify a name for this schema in the Schema Name field.
  5. In the Table Name field, specify a name for the table in the schema that the query will create.
  6. Type the SQL statement in the text box. The statement can include any functionality that the source database supports. Anzo does not validate the SQL.

    If the SQL query requires quotes around values, such as ‘2010-01-01’ or ‘TestValue’, make sure that you use single quotes ('). For example:

    SELECT * FROM Movies WHERE production_day='2021-08-01'

    Including double quotes (") in a schema query results in an error when the query is run.

    The following example creates a schema named employees. A table named all_employees will be created in the schema, and the table is created from the following SQL query:

    SELECT EmployeeID, FirstName, LastName, Title, Salary, BirthDate, HireDate, Region, Country
    FROM northwind.Employees
    WHERE EmployeeID

  7. Click Save to save the query. Anzo creates the new schema and adds it to the list of schemas on the Tables screen. Selecting the schema displays sample data on the right side of the screen.

    The automated data load workflow ignores all changes that are made to the schema on the Tables screen—except for changes to primary and foreign keys. For example, if you edit a column heading to change its semantic type, that change is disregarded when the graphmart is created. Only the original type from the data source is considered. If you add or change primary and foreign keys on the Tables screen, however, the automated data load workflow will retain those changes.

  8. If you want to create additional tables in the schema, follow these steps:
    1. Click the menu icon () in the Actions column for the schema name and select Add Table. For example:

      The Create New Table dialog box is displayed.

    2. In the Create New Table dialog box, specify a name for the new table in the Table Name field.
    3. In the Schema Query field, write the SQL query that defines the data for the new table.
    4. Click Save to add the table to the schema and return to the Tables screen.

For information about creating or changing primary keys and foreign keys, see Assigning Primary and Foreign Keys in a Schema.

Create an Incremental Schema

Follow the instructions below to create a schema by writing an SQL query that defines a subset of the data to onboard in increments.

  1. In the Anzo application, expand the Onboard menu and click Structured Data. Anzo displays the Data Sources screen, which lists any existing data sources. For example:

  2. Click the database for which you want to create an incremental schema. Anzo displays the Tables tab for the source. For example, the image below shows the Tables tab before any schemas have been created:

  3. Click the Create Schemas From Query button. Anzo displays the Create Schemas dialog box:

  4. In the Create Schemas dialog box, specify a name for this schema in the Schema Name field.
  5. In the Table Name field, specify a name for the table in the schema that the query will create.
  6. At the bottom of the screen, enable the Include increment data option by sliding the slider to the right. Anzo displays additional settings:

  7. Populate the following fields so that you can use the values as a guide for writing the schema query:
    • Incremental Column Name: The source column whose value will be used to increment the data.
    • Value: The value in the column to use as the stopping point for the first import process and the starting point for the next import.

      Do not include quote characters in the Value field. If the SQL query requires quotes around values, such as ‘2010-01-01’ or ‘TestValue’, include the quotes around the {INCREMENTVALUE} parameter in the query and not in the Value field. For example, if the value to increment on is ‘2010-01-01’, specify 2010-01-01 in the Value field and add the quotes to the query like the following example:

      SELECT * FROM Orders WHERE OrderData > '{INCREMENTVALUE}'

      In addition, make sure that you use single quotes (') in schema queries. Including double quotes (") in a schema query results in an error when the query is run.

    • Comparator: The operator to use for comparing source values against the value above.
  8. In the query text field, type the SQL statement that will target the appropriate source data. The WHERE clause must include the incremental column name, the comparison operator, and an INCREMENTVALUE parameter. This parameter is substituted with the Value at runtime. For example, in the query below, the incremental column name is EmployeeID, the comparator is > (greater than), and the {INCREMENTVALUE} parameter is specified after the comparator. {INCREMENTVALUE} is replaced with the value in the Value field at runtime:
    SELECT EmployeeID, FirstName, LastName, Title, Salary, BirthDate, HireDate, Region, Country
    FROM northwind.Employees
    WHERE EmployeeID > {INCREMENTVALUE}

    Make sure that the query includes the INCREMENTVALUE parameter and uses the same Incremental Column Name and Comparator values as the fields below the query. For example:

  9. Click Save to save the query. Anzo creates the new schema and adds it to the list of schemas on the Tables screen. Selecting the schema displays sample data on the right side of the screen.

    The automated data load workflow ignores all changes that are made to the schema on the Tables screen—except for changes to primary and foreign keys. For example, if you edit a column heading to change its semantic type, that change is disregarded when the graphmart is created. Only the original type from the data source is considered. If you add or change primary and foreign keys on the Tables screen, however, the automated data load workflow will retain those changes.

  10. If you want to create additional tables in the schema, follow these steps:
    1. Click the menu icon () in the Actions column for the schema name and select Add Table. For example:

      The Create New Table dialog box is displayed.

    2. In the Create New Table dialog box, specify a name for the new table in the Table Name field.
    3. In the Schema Query field, write the SQL query that defines the data for the new table.
    4. Click Save to add the table to the schema and return to the Tables screen.

For information about creating or changing primary keys and foreign keys, see Assigning Primary and Foreign Keys in a Schema.

When you are ready to onboard the data to Anzo, see Onboarding Data with the Automated Workflow for next steps. Or, if you want to onboard or virtualize the source by manually writing SPARQL queries against the Graph Data Interface service, see Onboarding or Virtualizing Data with SPARQL Queries.