Tutorial: Building a Sample Solution from Scratch

This topic helps get you started with Anzo by teaching you the basic steps to follow when creating a complete solution. The instructions below provide a sample CSV file and guide you through the steps required from importing the file to visualizing the data in a Hi-Res Analytics dashboard.

  1. Prepare the Environment
  2. Download the Sample Data
  3. Create a CSV Data Source and Import the Sample Data
  4. Ingest the Imported Data Set
  5. Run the ETL Pipeline
  6. Create and Activate the Movie Data Graphmart
  7. Explore the Data in a Hi-Res Analytics Dashboard
  8. Review the New Solution's Artifacts

Prepare the Environment

To give the tutorial continuity and make it straightforward to follow, it helps to complete a few general setup steps before you start building the solution. This section provides background information and describes the environment preparation.

This tutorial guides you through importing data from a file, running an ETL pipeline to create RDF files for AnzoGraph, and loading the data into AnzoGraph. Completing those steps requires a File Store that Anzo and AnzoGraph can access, a designated location (Anzo Data Store) on that storage system to place the RDF files that the ETL process generates, and a connection to AnzoGraph for loading the data.

Confirm that the following setup steps are complete before proceeding:

  1. An Anzo Data Store has been created for the RDF files that are generated during the ETL process. For instructions on creating a Data Store, see Creating an Anzo Data Store in the Anzo Deployment & User Guide.
  2. Anzo has a connection to AnzoGraph. For instructions on configuring the connection, see Connecting to AnzoGraph in the Anzo Deployment & User Guide.

Download the Sample Data

This tutorial uses a small sample data set from Data World. The data is in CSV format and includes IMDB movie data from 2006 – 2016. Click the link below to download IMDB-Movie-Data.zip to your computer. Then extract the ZIP file to make IMDB-Movie-Data.csv available for importing.

IMDB-Movie-Data.zip

Create a CSV Data Source and Import the Sample Data

  1. In the Anzo application, expand the Onboard menu and click Structured Data. Anzo displays the Data Sources screen, which lists any existing data sources:

  2. Click the Add Data Source button and select File Data Source > CSV Data Source. Anzo opens the Create CSV Data Source screen.

  3. Type a name for the source in the Title field. For example, "Sample Movie Data." Then type an optional description in the Description field. For example:

  4. Click Save. Anzo saves the source and displays the Tables tab for the new Data Source.

  5. Click the Add New File button. Anzo displays the Add New File dialog box, and the Source is set to From Your Computer.

  6. As a best practice, check the upload location that is listed in the Upload To field by hovering your pointer over the value to view the tooltip. Make sure the upload location is a directory on the shared File Store and not a location in the server installation path. If the file is not uploaded to the shared File Store it is not accessible by applications like AnzoGraph or Spark. In addition, other users cannot publish pipelines for the Data Source because they typically do not have access to the file.

    For example, viewing the Upload To location for the screen above shows that the file will be uploaded to the server installation path, /opt/Anzo/Server/data...

    If your Upload To location is configured to upload the file to the server installation path, it is important to click Change and select another upload location that is on the shared File Store. For example, the image below shows the Upload Folder Location dialog box that is presented after clicking Change. A folder called fileUploads is selected in the /nfs/data directory on the File Store.

    An administrator can configure the base upload path so that it points to a location on the File Store by default. For more information, see Setting the Default Base File Store Path for File Uploads in the Anzo Deployment & User Guide.

  7. Next, drag and drop the IMDB-Movie-Data.csv file onto the screen or click browse and navigate to the file and double-click it to select it. Anzo attaches the file and the Next button becomes active. Click Next. Anzo lists the file on the left side of the screen with a status of Pending. For example:

  8. Click the Process Pending Files button to import the IMDB data to Anzo. Anzo imports the data and the status changes to Processed. Additional buttons are also displayed on the right side of the screen. You can click the table row for IMDB-Movie-Data to display the schema details.

The data is now imported. Proceed to the next step to ingest the data and automatically derive a Data Model, generate a Mapping, and create an ETL Pipeline for transforming the data to the RDF graph model.

Ingest the Imported Data Set

  1. On the Tables screen (from the last step of the procedure above), click the Ingest button. Anzo opens the Ingest dialog box and automatically populates the Data Source Connection value. If there is only one configured Data Store, the Anzo Data Store value is also auto-populated. In addition, if the default ETL Engine is configured for the system, the Auto Map Engine Config field is also populated. For example, in the image below the Anzo Data Store field is not populated because there are multiple available choices. The ETL Engine field is populated because the Local Sparkler Engine is configured as the default ETL Engine:

  2. If necessary, click the Anzo Data Store field and select the Data Store for this pipeline.
  3. If necessary, click the Auto Map Engine Config field and select the ETL Engine to use. For this tutorial, select Local Sparkler Engine. That means the pipeline will use the SPARQL-driven Sparkler ETL compiler to process the source data for the Spark engine.
  4. Leave the Select all tables radio button selected. For example, the image below shows the completed screen:

  5. Click Save. Anzo creates a Pipeline for the Data Source and displays a "Pipeline was successfully created" confirmation message.

As part of creating the Pipeline, Anzo also generates a Data Model and Mapping for the sample data set. The last step in this tutorial describes each of the components. Proceed to the next step to run the Pipeline and generate the RDF File-Based Linked Data Set (FLDS).

Run the ETL Pipeline

  1. In the Onboard menu, click Structured Data. Then click the Pipelines tab. Anzo displays the Pipelines screen, which lists the Pipeline for the sample movie data set. For example:

  2. Click the Pipeline title to view the pipeline Overview. For example:

  3. Click the Publish All button to run all the jobs in the Pipeline and generate the FLDS in the specified Data Store.

Anzo publishes the ETL files and displays confirmation messages as each step in the process completes. When the job execution is finished, the Sample Movie Dataset becomes available in the Dataset catalog. Proceed to the next step to create a Graphmart for the new Dataset and load it into AnzoGraph.

Create and Activate the Movie Data Graphmart

  1. Expand the Blend menu and click Datasets. Anzo displays the Datasets catalog, which shows an inventory of all of the onboarded data in Anzo. It lists the Sample Movie Dataset. For example:

  2. Hover the pointer over the Sample Movie Data item to display a checkbox in the left column, and then select the checkbox. Anzo adds the Dataset to the shopping cart and additional buttons become available at the bottom of the screen. For example:

  3. Click the Create Graphmart button. Anzo displays the Create Graphmart screen. Anzo populates the Title field by appending "Graphmart" to the Dataset name. For example:

    By default, the current working edition of the Dataset, called the Default Edition, is selected for the Graphmart. The Default Edition is managed by Anzo (the Managed Edition) and always contains the Data Components from the last successful run of the Pipeline. Since this is a new Data Source and the first run of the Pipeline, no other editions exist. To learn more about Dataset Editions, see Managing Dataset Editions in the Anzo Deployment & User Guide.

  4. On the Create Graphmart screen, you have the option edit the Title and type an optional Description. When you are ready to create the Graphmart and load the data to AnzoGraph, click Create & Activate at the bottom of the screen. Anzo creates and activates the Graphmart and displays the Overview screen for the new Graphmart. For example:

    If you want to learn about the Graphmart settings that are shown in the image above, see Graphmart Settings Reference in the Anzo Deployment & User Guide.

Now that the data is loaded into AnzoGraph, it is available to explore and analyze in dashboards. Proceed to the next step to explore the sample data in the Hi-Res Analytics application.

Explore the Data in a Hi-Res Analytics Dashboard

  1. On the Graphmart screen (as shown in the last step above), click the Create Dashboard button. Anzo opens the Hi-Res Analytics application and displays the Create Dashboard dialog box, which asks you to select the type of Dashboard to create:

  2. Select Graphmart Dashboard and click Next. The Create Dashboard dialog box is displayed:

  3. On the Create Dashboard screen, specify a Title for the dashboard and add an optional Description. Leave the default Graphmart dashboard value in the Type field. For example:

  4. Click Finish to create the Dashboard. The new Dashboard appears as a new tab on the screen and contains a sub-tab titled What can I do next?. This tab acts as a wizard to guide you through the initial Dashboard creation. Anzo populates the Graphmart and Data Layers panels with the sample movie data Graphmart name and default Data Layer for the Graphmart.

  5. To define the type of data to display on the dashboard, click the Specify the type of data you would like to see link on the What can I do next tab. The Select Data Types dialog box appears and displays the available data types. Since the sample data set has one type or class of data, only IMDB-Movie-Data is listed. The value in parentheses shows the total number of instances of that type.

  6. Click IMDB-Movie-Data to select it, and then click OK to add the data type to the Dashboard. The selected type is added to the Data Types panel.
  7. In the main Hi-Res Analytics toolbar, click the Dashboard button and select Save to save the Dashboard.

    To start to explore the data and see what values exist for the properties in the IMDB-Movie-Data class, it can help to add a filter to the dashboard. Filters reveal the values associated with properties. Learning more about the values enables you to start making decisions about what properties to group on, for example, what properties have relationships, and what results you want to visualize.

  8. To create a filter, click the Add facets to filter the data link on the What can I do next tab. Anzo displays the Create Filter dialog box:

  9. In the Create Filter dialog box, click the Fields field and browse the available properties. For this tutorial, select Year, and then click Close to close the Fields drop-down list. Additional options are displayed:

  10. Click the Filter drop-down list and browse the available filter types. For this tutorial, select Single Select List.
  11. At the bottom of the screen under Format, click the Type drop-down list and select No format. Removing formatting from the filter ensures that commas are not displayed for the Year values. The image below shows the completed dialog box.

  12. Click OK to close the dialog box and add the new filter to the dashboard. Anzo adds the Year filter to the left panel. The new filter will enable you to narrow the scope of the data when it is visualized.

    Once you have a good understanding of the values and relationships that exist in the data set, you can experiment with lenses and decide on the most appropriate way to display the data. Creating a Table lens is a quick way to get started with learning about the values and relationships in the data and to see how the new filter works.

  13. To create a Table lens, click the Select or create visualizations of your data link on the What can I do next tab. Anzo displays the Lens Selection dialog box. Since lenses have not been created previously, the list of lenses is empty:

  14. Click the Create a new lens link at the bottom of the screen. Anzo displays the Create Lens dialog box:

  15. Select the Table lens and then click Next. Anzo displays the Create Lens dialog box:

  16. Type a Title for the lens, for example "Movie Data Table," and then click Finish. Anzo adds the lens as a new sub-tab on the dashboard and opens the Table Designer:

  17. In the Designer, click the Auto-generate columns icon () to add all available columns to the table.

    Since the sample data set includes only 12 columns, it is not overwhelming to view all columns at once. For larger data sets, you might want to be more selective when adding columns to table lenses.

  18. Click Save. The table lens is added to the dashboard and is populated with data. For example:

  19. If you want to make adjustments to the table, such as to reformat the Year column to remove the commas from the values, click the Designer button for the lens. In the Designer, select the Year column, and then click Format under Column Value Expression in the Column Details:

    In the Format Column dialog box, click the Type drop-down list and select Year or No format, and then click Close. Click Save to change the lens change and close the Designer.

  20. Now, you can filter the data and reduce the number of results that are displayed by clicking a Year value in the Year filter. In the filter, click any one of the years in the list to display only the data from that year. You can click another year to filter the table again, and so on.
  21. In the Hi-Res Analytics main toolbar, click the Dashboard button and select Save to save the changes. Remember to save dashboards periodically.

Now that you can view a summary of the data in a table, it can help you determine how to further narrow or expand the results by adding, changing, or removing filters. In addition, you can experiment by adding other lenses to the dashboard to find the ideal way to display the data to answer the questions that you have. For example, the table includes a Revenue column. It might be interesting to see which movies made the most revenue. The next steps guide you through creating a chart lens to display movie revenue.

  1. In the Hi-Res Analytics main toolbar, click Lenses and select New. The Create Lens dialog box opens.

  2. Select Chart and click Next. Anzo displays the Create Lens dialog box.
  3. Specify a Title and add an optional Description for the new lens. For example:

  4. Click Finish. Anzo displays the Chart Designer, which enables you to configure the type of chart to create, such as column, pie, or line, specify the data that will populate the chart, and customize details such as the chart's theme and fonts, legend, title, and tooltip formats.

  5. For this tutorial, accept the default Chart Type of Column, and click Chart Data to view the Chart Data screen and configure the data to display.

  6. On the left side of the screen, click Rename under Series 1. Specify a new name in the Rename Series field, for example, "Revenue by Title," and then click OK.
  7. On the right side of the screen, click the Group field and select the Title property from the drop-down list. The Group field sets the values for the X Axis on the column chart. Then click Close to close the drop-down list. Anzo also populates the Group label with the Title property.
  8. Click the Value field and select the Revenue (Millions) property from the drop-down list. The Value field sets the Y Axis values. Click Close to close the drop-down list.
  9. Click Save. Anzo adds the new Chart lens to the dashboard. To narrow the results that are displayed and filter on one year at a time, click a year in the Year filter. For example, by clicking 2006 in the filter, the chart lists the revenue for titles that were released in 2006 and shows that "Pirates of the Caribbean: Dead Man's Chest" had the most revenue for that year:

    You can hover the pointer over a bar in the chart to view details.
  10. If you want to add more data to the Chart, such as to compare Revenue versus Rating for each Title, follow these steps:
    1. Open the Chart Designer by clicking the Designer button above the lens tabs.

    2. In the Designer, click Chart Data.
    3. On the Chart Data screen, click the Add a series link at the bottom of the screen. A new series, named Series 2, is added to the list.

    4. Under Series 2, click Rename and specify a name for the series, such as "Rating by Title." Then click the Group field on the right side of the screen and select the Title property from the drop-down list. Click Close to close the list.
    5. Click the Value field and select the Rating property from the drop-down list. Close the list.
    6. Click Save to save the changes. The Chart is updated to display the Rating values in addition to the Revenue. For example:

      If you filtered the data in a previous step, the lens remains filtered on the chosen Year. You can de-select the Year in the filter or choose an alternate Year to explore the Ratings and Revenue for additional Titles.

For more information about working with Dashboards, see Analyzing Data with Hi-Res Analytics in the Anzo Deployment & User Guide.

Review the New Solution's Artifacts

This section briefly reviews the artifacts that were created as part of the new sample solution and provides references to more details about each of the artifacts.

During the process of automatically ingesting the sample movie data from the supplied CSV file, Anzo generated the following components:

  • A Model, which describes the structure and type of data that the movie data set contains. For more information and instructions for viewing the derived model, see Modeling Data in the Anzo Deployment & User Guide.
  • A Mapping, which is input to the ETL job and maps the source data to the target elements defined in the data model. For more information and instructions for viewing the generated mapping, see Working with Mappings in the Anzo Deployment & User Guide.
  • A Dataset Pipeline, which contains the ETL job that generates the target data set. For more information and instructions for viewing the generated pipeline, see Configuring Pipelines in the Anzo Deployment & User Guide.
Related Topics