Generating a Graphmart Data Profile

Similar to generating a profile for a Dataset (see Generating a Dataset Data Profile), Anzo provides the ability to profile a Graphmart, which may include multiple Datasets. When metrics are generated for Graphmarts, Anzo profiles the data that results from all of the enabled Data Layers and reports metrics for the classes and properties in the model as well as statistics about the values for the properties. Generating a Graphmart profile helps users perform data discovery, assess the quality of the data, and decide on the types of analytics to perform.

To generate a Graphmart profile, AnzoGraph must be online. If you have dynamic AnzoGraph deployments enabled, Anzo will provision AnzoGraph automatically when the profile is generated.

  1. In the Anzo application, expand the Blend menu and click Graphmarts. Anzo displays a list of the existing Graphmarts. For example:

  2. On the Graphmarts screen, click the name of the Graphmart for which you want to generate metrics. Anzo displays the Overview screen for the Graphmart. For example:

  3. If necessary, activate the Graphmart so that the Profile Data button becomes active, and then click Profile Data. AnzoGraph computes the data profiling metrics. The process may take several minutes. You can check the status of the process in the Activity Log. The Activity Log also presents the option to stop the profiling process by clicking Cancel under the progress bar for the task. For example:

    Once the metrics are generated, the profile becomes available to explore in Anzo.

  4. To display the metrics, click the Explore tab. Then select a node or edge in the graph view on the left side of the screen or expand the model in the middle of the screen and select a class or property. For example:

    You can click the Expand button on the right side of the screen to collapse the graph view and expand the metrics view.

Select any class or property to view its metrics. When a class is selected, Anzo displays the number of properties and total number of instances as well as one or more of the following metrics for each property in the class. The metrics that are presented depend on the data type of the properties that are in the class.

  • Percentile Metric: This metric presents the data distribution for a property in percentiles.
  • Top Value Counts Metric: This metric displays the count (as a histogram) of the 10 most frequently occurring values for a property.
  • Presence Metric: This metric displays the number of values present and not present for a property.

When a property is selected, Anzo displays the metrics described above and one or more of the following metrics, depending on the data type of the property:

  • Extrema Metric: Shows the smallest and largest values.
  • Geometric Mean Metric: Shows the geometric mean of numeric values.
  • Mean Metric: Shows the arithmetic mean of numeric values.
  • Median Metric: Shows the arithmetic median of numeric values.
  • Mode Metric: Shows the value that appears most often.
  • Std Deviation Metric: Shows the standard deviation in the set of values.
  • Unique Values Metric: Shows the number of unique values.
Related Topics