Generating Graph Data Metrics

Similar to generating metrics for a data source schema (see Generating Source Data Metrics), Anzo provides the ability to calculate metrics for a graph data set in its final format. When metrics are generated for graph data, Anzo profiles the entire data set and reports metrics for the classes and properties in the model as well as statistics about the values for the properties. Generating graph data metrics helps users perform data discovery, assess the quality of the onboarded data, and decide whether to use the data set in a particular graphmart. The report can also assist users in determining the types of data layers to create and writing the queries to include in the steps.

Important To generate graph data metrics, AnzoGraph must be online. If you have dynamic AnzoGraph deployments enabled, Anzo will provision AnzoGraph automatically when metrics are generated.
  1. In the Anzo console, expand the Blend menu and click Datasets. Anzo displays the Datasets screen, which lists the catalog of data sets. For example:

  2. On the Datasets screen, click the name of the data set for which you want to generate metrics. Anzo displays the Explore screen for the source. For example:

    NoteThe instance counts for the classes in the graph view on the left side of the screen are the initial, non-unique counts from the ETL engine. Most likely the data has not yet been deduplicated. After generating metrics, the instance counts may change.
  3. Click the Generate Metrics button at the top of the screen. Anzo provisions a temporary graphmart and loads the data into AnzoGraph. AnzoGraph computes the data profiling metrics. The process may take several minutes. Once the metrics are generated, Anzo removes the graphmart from AnzoGraph and the new information becomes available to explore in Anzo.
  4. To display the metrics, select a node or edge in the graph view on the left side of the screen or expand the model in the middle of the screen and select a class or property. For example:

    You can click the Expand button on the right side of the screen to collapse the graph view and expand the metrics view. For example:

Select any class or property to view its metrics. When a class is selected, Anzo displays the number of properties and total number of instances as well as one or more of the following metrics for each property in the class. The metrics that are calculated depend on the data type of the property:

  • Percentile Metric: This metric presents the data distribution for a property in percentiles.
  • Top Value Counts Metric: This metric displays the count (as a histogram) of the 10 most frequently occurring values for a property.
  • Presence Metric: This metric displays the number of values present and not present for a property.

When a property is selected, Anzo displays the metrics described above and one or more of the following metrics, depending on the data type of the property:

  • Extrema Metric: Shows the smallest and largest values.
  • Geometric Mean Metric: Shows the geometric mean of the values.
  • Median Metric: Shows the middle value.
  • Mode Metric: Shows the value that appears most often.
  • Std Deviation Metric: Shows the standard deviation in the set of values.
  • Unique Values Metric: Shows the number of unique values.
Related Topics