Tutorial: Building a Sample Solution from Scratch
This topic helps get you started with Anzo by teaching you all of the basic steps to follow to create a complete solution. The instructions below provide a sample CSV file and guide you through all of the steps required to import the file and visualize the data in Hi-Res Analytics.
- Prepare the Environment
- Download the Sample Data
- Create a CSV Data Source and Import the Sample Data
- Auto-Ingest the Imported Data Set
- Run the ETL Pipeline
- Create a Graphmart and Load the Sample Data to AnzoGraph
- Explore the Data in Hi-Res Analytics
- Review the New Solution's Components
Prepare the Environment
To give the tutorial continuity and make it straightforward to follow, it helps to complete a few general setup steps before you start building the solution. This section provides background information and describes the general environment preparation.
This tutorial guides you through importing data from a file, running an ETL pipeline to create RDF files for AnzoGraph, and loading the data into AnzoGraph. Completing those steps requires a file storage system that both Anzo and AnzoGraph can access, a designated location (graph data store) on that storage system to place the RDF files that Anzo generates, and a connection to AnzoGraph for loading the data.
Confirm that the following setup steps are complete before proceeding:
- AnzoGraph and AnzoGraph have access to the same file store and its location is configured in Anzo. Note: The Anzo server file system is configured and accessible by default. If you store files on a file system that is mounted directly to the Anzo and AnzoGraph servers, you do not need to configure a new file system location. If you store files in the cloud, HDFS, or a location that is accessible via FTP, configure that storage location in Anzo.
For instructions on configuring a file storage location, see Connecting to a File Store.
- A graph data store has been created for the RDF files that Anzo generates during the ETL process. For instructions on creating a graph data store, see Creating an Anzo Data Store.
- Anzo has a connection to AnzoGraph. For instructions on configuring the connection, see Making a Basic Connection to AnzoGraph.
Download the Sample Data
This tutorial uses a small sample data set from Data World. The data is in CSV format and includes IMDB movie data from 2006 – 2016. Click the link below to download IMDB-Movie-Data.zip to your computer. Then extract the ZIP file to view IMDB-Movie-Data.csv.
IMDB-Movie-Data.zip
Create a CSV Data Source and Import the Sample Data
- Log in to the Anzo console, expand the Onboard menu item, and click Structured Data. Anzo displays the Data Sources screen, which lists any existing data sources:
- Click the Create button and select CSV Data Source. Anzo opens the Create CSV Datasource screen.
- Type a name for the source in the Datsource name field. For example, type "Sample Movie Data." Then type an optional description in the Description field. For example:
- Click Save. Anzo saves the source and displays the Files tab.
- On the Files tab, click Select From Computer. Then drag and drop the file onto the screen or click browse and navigate to IMDB-Movie-Data.csv. The double-click the file to select it. Anzo attaches the file and the Add Files button becomes active. For example:
- Click Add Files. Anzo lists the file on the Files tab with a status of Pending. For example:
- Click the Import Pending Files button to import the IMDB data to Anzo. Anzo imports the data and the status changes to Imported.
The data is now imported to Anzo as a CSV data set.
Proceed to the next step to ingest the data and automatically derive a data model, generate a mapping, and create an ETL pipeline for transforming the new data set to the RDF graph model.
Auto-Ingest the Imported Data Set
- In the Anzo console under Onboard click Structured Data. Then click the Schemas tab.Anzo displays the Schemas screen, which lists the sample movie data set schema that you imported. For example:
- Click the schema title to view the schema table details. For example:
- Click the Auto Ingest button to set up the automatic ingestion. Anzo opens the Auto Ingest dialog box and automatically populates the screen with the data source, graph source, and default Spark engine. For example:
- If necessary, click the Source Data Connection field and select the sample movie data source.
- If necessary, click the Graph Data Source field and select the graph data source to use for storing the RDF files (see Creating an Anzo Data Store if needed).
- Leave the Select all tables radio button selected.
- Click Save. Anzo creates a pipeline for the data set and displays a confirmation:
Click OK to close the dialog box.
As part of ingesting the data set and creating a pipeline, Anzo also generates a data model and mapping for the sample data set. The last step in this tutorial describes each of the components. Proceed to the next step to run the pipeline and generate the AnzoGraph load files.
Run the ETL Pipeline
- In the Anzo console in the Structured Data workflow, click the Pipelines tab. Anzo displays the Pipelines screen, which lists the pipeline for the sample movie data set. For example:
- Click the pipeline title to view the pipeline overview. For example:
- Click the Publish button to run the pipeline, transform the data to graph format, and generate the load files for AnzoGraph in the graph data store location.
Anzo publishes the ETL files and displays a confirmation that job execution is in progress. Click OK to close the dialog box. Anzo completes the job execution and the sample movie data set becomes available in the Dataset catalog. Proceed to the next step to create a graphmart for the new data and load it into AnzoGraph.
Create a Graphmart and Load the Sample Data to AnzoGraph
- In the Anzo console, expand the Blend menu and click Datasets. Anzo displays the Datasets catalog, which shows an inventory of all of the data in Anzo. It lists the sample movie data set. For example:
- Hover the pointer over the Sample Movie Data item to display a checkbox in the left column, and then select the checkbox. Anzo adds the data set to the shopping cart and additional icons become available at the top of the screen. For example:
- Click the shopping cart icon () at the top of the screen. Anzo displays the Create Graphmart screen. For example:
Anzo populates the Title field by appending "Graphmart" to the data set name.
- On the Create Graphmart screen, you have the option edit the title and type an optional description. Click Save when you are ready to create the graphmart. Anzo creates the graphmart and displays the Graphmart Overview screen for the new graphmart. For example:
Since the source data is a single CSV file with 12 columns, the data model becomes a single class with 12 properties. The graphmart view is a single blue circle because there is one class. Graphmarts with multiple classes with relationships between them show a detailed graph view.
- To load the graphmart to AnzoGraph, slide the slider at the top of the screen from Inactive to Active. Anzo loads the graphmart.
Now that the data is loaded into AnzoGraph, it is available to view in dashboards. Proceed to the next step to explore the sample data with Hi-Res Analytics.
Explore the Data in Hi-Res Analytics
- On the Graphmart screen for the sample movie data graphmart, click the Create Dashboard button. Anzo opens the Hi-Res Analytics application and displays the New Dashboard dialog box:
- On the New Dashboard dialog box, type a Title for the dashboard and an optional Description. Leave the default Graphmart dashboard value in the Type field. For example:
- Click OK to create the dashboard. The new dashboard appears as a new tab on the screen and contains a sub-tab titled What can I do next?. This tab acts as a wizard to guide you through the initial dashboard creation.
Anzo populates the Graphmart and Data Layers panels with the sample movie data graphmart and default data layer for the graphmart.
- To define the type of data from the graphmart to display on the dashboard, click the Specify the type of data you would like to see link on the What can I do next tab. The Select Data Types dialog box appears and displays the available data types. Since the sample data set has one type or class of data, only IMDB-Movie-Data is listed. The value in parentheses shows the total number of instances of that type.
- Click IMDB-Movie-Data to select it, and then click OK to add the data type to the dashboard.
- In the main Hi-Res Analytics toolbar, click the Dashboard button and select Save to save the dashboard.
To start to explore the data and see what values exist for the properties in the IMDB-Movie-Data class, it can help to add a filter to the dashboard. Filters reveal the values associated with properties. Learning more about the values enables you to start making decisions about what properties to group on, for example, what properties have relationships, and what results you want to visualize.
- To create a filter, click the Add facets to filter the data link on the What can I do next tab. Anzo displays the Create Filter dialog box:
- In the Create Filter dialog box, click the Fields field and browse the available properties. For this tutorial, select Year, and then click Close to close the Fields drop-down list.
- Click the Filter drop-down list and browse the available filter types. For this tutorial, select Single Select List. Anzo adds the filter type to the Create Filter dialog box.
- At the bottom of the screen under Format, click the Type drop-down list and select No format. Removing formatting from the filter ensures that commas are not displayed for the Year values.
- Click OK to close the dialog box and add the new filter to the dashboard. Anzo adds the Year filter to the left panel. The new filter will enable you to narrow the scope of the data when it is visualized.
Once you have a good understanding of the values and relationships that exist in the data set, you can experiment with lenses and decide on the most appropriate way to display the data. Creating a Table lens is a quick way to view the data that you filtered.
- To create a lens, click the Select or create visualizations of your data link on the What can I do next tab. Anzo displays the Lens Selection dialog box. Since lenses have not been created previously, the list of lenses is empty:
- In the Create Lens dialog box, click the Create a new lens link at the bottom of the screen. Anzo displays the Create Lens dialog box:
- In the list of lens types, select the Table lens and then click Next. Anzo displays the General Information dialog box:
- Type a Title for the lens, and then click Finish. Anzo adds the lens as a new sub-tab on the dashboard and opens the Table Designer:
- In the Designer, click the Auto-generate columns icon () to add all available columns to the table.
Note: Since the sample data set includes only 12 columns, it is not overwhelming to view all columns at once. For larger data sets, you might want to be more selective when adding columns to table lenses.
- Click Save. The table lens displays on the dashboard and populates with data. For example:
You can reduce the number of results that are displayed by clicking a year value in the Year filter.
- In the main Hi-Res Analytics toolbar, click the Dashboard button and select Save to save the changes. Remember to save dashboards periodically.
Now that you can view a summary of the data in a table, it can help you determine how to further narrow or expand the results by adding, changing, or removing filters. In addition, you can experiment by adding other lenses to the dashboard to find the ideal way to display the data to answer the questions that you have. For example, the table includes a Revenue column. It might be interesting to see which movies made the most revenue. The next steps guide you through creating a chart lens to display movie revenue.
- In the Hi-Res Analytics main toolbar, click Lenses and select New. The Create Lens dialog box opens.
- Select Chart and click Next. Anzo displays the General Information dialog box.
- Type a Title and optional Description for the new lens. For example:
- Click Finish. Anzo displays the Chart Designer, which enables you to configure the type of chart to create, such as column, pie, or line, specify the data that will populate the chart, and customize details such as the chart's theme and fonts, legend, title, and tooltip formats.
- For this tutorial, accept the default Chart Type of "Column," and click Chart Data to view the Chart Data screen and configure the data to display.
- Click the Group field and select the Title property from the drop-down list. The Group field sets the values for the x-axis on the column chart. Anzo also populates the Group label with the Title property. Click Close to close the drop-down list.
- Click the Value field and select the Revenue (Millions) property from the drop-down list. The Value field sets the y-axis values. Click Close to close the drop-down list.
- Click Save. Anzo adds the new chart lens to the dashboard. To narrow the results that are displayed and filter on one year at a time, click a year in the Year filter. For example, by clicking 2006 in the filter, the chart lists the revenue for titles that were released in 2006 and shows that "Pirates of the Caribbean: Dead Man's Chest" had the most revenue for that year:
You can hover the pointer over a bar in the chart to view details.
- If you want to add more data to the chart, such as to compare revenue versus rating for each title, follow these steps:
- Open the chart designer by clicking the Designer button above the lens tabs.
- In the Designer, click Chart Data.
- On the Chart Data screen next to Series Configurations, click the plus icon () to add a new series.
- For Series 2, click the Group field and select the Title property from the drop-down list. Then click Close to close the list.
- Click the Value field and select the Rating property from the drop-down list. Close the list.
- Click Save to save the chart changes. The chart updates to display the Rating values in addition to the Revenue. For example:
For more information about working with Hi-Res Analytics, see Analyzing Data with Hi-Res Analytics.
Review the New Solution's Components
This section briefly reviews the components that were created as part of the new sample solution and provides references to more details about each of the components.
During the process of automatically ingesting the sample movie data from the supplied CSV file, Anzo generated the following components:
- A data model, which describes the structure and type of data that the movie data set contains. For more information and instructions for viewing the derived model, see Modeling Data.
- A mapping, which is input to the ETL job and maps the source data to the target elements defined in the data model. For more information and instructions for viewing the generated mapping, see Working with Mappings.
- An ETL pipeline, which contains the ETL job that generates the target data set. For more information and instructions for viewing the generated pipeline, see Working with Pipelines.