Defining a Database Schema

This topic provides information about creating the schema to use when importing data from a database. The schema defines the source data to onboard. Anzo supports multiple options for defining the schema. You can import a schema from the database, you can write a static SQL query that defines the data, or, if you want to import data incrementally, you can write an incremental SQL query that includes parameters that automatically increment when the ETL pipeline is run.

You can import or create up to 5 schemas per database data source. To include more than 5 schemas, create another data source for the additional schemas.

Select an option from the list below for instructions on creating that type of schema:

Importing a Predefined Schema

Follow the steps below to import a predefined schema from the source database to Anzo.

By default, Anzo is configured to exclude Views from the list of available Schemas to import. For information about including Views as tables that can be imported, see Including Views as Schemas for Database Data Sources.

  1. In the Anzo application, expand the Onboard menu and click Structured Data. Anzo displays the Data Sources screen, which lists any existing data sources. For example:

  2. Click the data source for which you want to import a schema. Anzo displays the Tables tab for the source. For example:

  3. Click the Import Schemas button. Anzo displays the Import Schemas dialog box, which lists any predefined schemas in the database. For example:

    If you do not see a schema that you expect to see, make sure that you have the necessary access to the data source.

  4. Select the checkbox next to each schema that you want to import, and then click OK. Anzo imports the selected schema or schemas and lists the imported schemas on the Tables screen. For example:

Once the schema or schemas are imported, they are listed on the left side of the screen. You can expand a schema to view its tables and selecting a row in the schema displays the sample data on the right side of the screen. Now that a schema has been defined, the source data can be onboarded to Anzo. For information about creating a metadata dictionary for this data source, see Creating a Metadata Dictionary. For instructions on onboarding the data by automatically generating the model, mappings, and ETL pipeline, see Ingesting a New Data Source.

Creating a Schema from an SQL Query

Follow the instructions below to create a schema by writing an SQL query that defines the data to onboard. For information about writing a schema query that onboards data from a database incrementally, see Creating an Incremental Schema.

For better ETL pipeline performance, it is beneficial to include joins and/or filters in schema queries rather than configuring those operations at the mapping level. For more information, see Performance Considerations for Database Pipelines.

  1. In the Anzo application, expand the Onboard menu and click Structured Data. Anzo displays the Data Sources screen, which lists any existing data sources. For example:

  2. Click the data source for which you want to create a schema. Anzo displays the Tables tab for the source. For example:

  3. Click the Create Schemas From Query button. Anzo displays the Create Schemas dialog box:

  4. In the Create Schemas dialog box, specify a name for this schema in the Schema Name field.
  5. In the Table Name field, specify a name for the table in the schema that the query will create.
  6. Type the SQL statement in the text box. The statement can include any functionality that the source database supports. Anzo does not validate the SQL. The following example creates a schema named employees. A table named all_employees will be created in the schema, and the table will be created from the SQL query:
    SELECT EmployeeID, FirstName, LastName, Title, Salary, BirthDate, HireDate, Region, Country
    FROM northwind.Employees
    WHERE EmployeeID

  7. Click Save to save the query. Anzo creates the new schema and adds it to the list of schemas on the Tables screen. For example:

    You can expand the schema to view its tables. Selecting a row in the schema displays the sample data on the right side of the screen.

  8. If you want to create additional tables in the schema, follow these steps:
    1. Click the menu icon () in the Actions column for the schema name and select Add Table. For example:

      The Create New Table dialog box is displayed.

    2. In the Create New Table dialog box, specify a name for the new table in the Table Name field. In the Schema Query field, write the SQL query that defines the data for the table.
    3. Click Save to add the table to the schema and return to the Tables screen.

Now that a schema has been defined, the source data can be onboarded to Anzo. For information about creating a metadata dictionary for this data source, see Creating a Metadata Dictionary. For instructions on onboarding the data by automatically generating the model, mappings, and ETL pipeline, see Ingesting a New Data Source.

Creating an Incremental Schema

Follow the instructions below to create a schema by writing an SQL query that defines a subset of the data to onboard in increments.

For better ETL pipeline performance, it is beneficial to include joins and/or filters in schema queries rather than configuring those operations at the mapping level. For more information, see Performance Considerations for Database Pipelines.

  1. In the Anzo application, expand the Onboard menu and click Structured Data. Anzo displays the Data Sources screen, which lists any existing data sources. For example:

  2. Click the data source for which you want to create an incremental schema. Anzo displays the Tables tab for the source. For example:

  3. Click the Create Schemas From Query button. Anzo displays the Create Schemas dialog box:

  4. In the Create Schemas dialog box, specify a name for this schema in the Schema Name field.
  5. In the Table Name field, specify a name for the table in the schema that the query will create.
  6. At the bottom of the screen, enable the Include increment data option by sliding the slider to the right. Anzo displays additional settings. For example:

  7. Populate the following fields so that you can use the values as a guide for writing the schema query:
    • Incremental Column Name: The source column whose value will be used to increment the data.
    • Value: The value in the column to use as the stopping point for the current import process and the starting point for the next import.

      Do not include quote characters in the Value field. If the SQL query requires quotes around values, such as ‘2010-01-01’ or ‘TestValue’, include the quotes around the {INCREMENTVALUE} parameter in the query and not in the Value field. For example, if the value to increment on is ‘2010-01-01’, specify 2010-01-01 in the Value field and add the quotes to the query like the following example:

      SELECT * FROM Orders WHERE OrderData > ‘{INCREMENTVALUE}’
    • Comparator: The operator to use for comparing source values against the value above.
  8. In the query text field, type the SQL statement that will target the appropriate source data. The WHERE clause must include the incremental column name, the comparison operator, and an INCREMENTVALUE parameter that is substituted with the Value at runtime. For example, in the query below the incremental column name is EmployeeID, the comparator is > (greater than), and the {INCREMENTVALUE} parameter, which will be replaced with 5 (the value in the Value field) at runtime:
    SELECT EmployeeID, FirstName, LastName, Title, Salary, BirthDate, HireDate, Region, Country
    FROM northwind.Employees
    WHERE EmployeeID > {INCREMENTVALUE}

    Make sure that the query includes the INCREMENTVALUE parameter and uses the same Incremental Column Name and Comparator values as the fields below the query. For example:

  9. Click Save to save the query. Anzo creates the new schema and adds it to the list of schemas on the screen. For example:

    You can expand the schema to view its tables. Selecting a row in the schema displays the sample data on the right side of the screen.

  10. If you want to create additional tables in the schema, follow these steps:
    1. Click the menu icon () in the Actions column for the schema name and select Add Table. For example:

      The Create New Table dialog box is displayed.

    2. In the Create New Table dialog box, specify a name for the new table in the Table Name field. In the Schema Query field, write the SQL query that defines the data for the table.
    3. Click Save to add the table to the schema and return to the Tables screen.

Now that a schema has been defined, the source data can be onboarded to Anzo. For information about creating a metadata dictionary for this data source, see Creating a Metadata Dictionary. For instructions on onboarding the data by automatically generating the model, mappings, and ETL pipeline, see Ingesting a New Data Source.

See Incremental Pipeline Reference for important information about running a pipeline that includes an incremental schema.

Related Topics