Anzo Features and Benefits

This topic provides information about key features, concepts, and vocabulary to know when working with Anzo. For more conceptual information about Anzo processes and usage, see Anzo Concepts and the User Interface in the Anzo Getting Started Guide.

This topic introduces the following features and provides references to more detailed information:

Linked Data Set

Linked data sets are a fundamental concept in Anzo. Anzo organizes all data, including system data, into linked data sets. A linked data set is associated with a data model and can be searched, discovered, shared, and protected with access control. For example, graphmarts are organized in a linked data set or registry of graphmarts, pipelines are organized in a linked data set, the Activity Log is a linked data set, data source configurations exist in a linked data set, and so on. Graph data sources, which Anzo generates as a set of Turtle files during the ETL process, are examples of file-based linked data sets.

Data Sources

Anzo supports various structured and unstructured data sources. Users can establish connections to databases, such as Microsoft, Oracle, Hadoop, Teradata, PostgreSQL, or Google databases, import data from worksheets, CSV, JSON, or XML files, or set up unstructured pipelines that process data from common file types such as Office documents, PDFs, web pages, and email messages. For more information, see Connecting to Data Sources.

Onboarding

Anzo onboards both structured and unstructured data through automated or manually created Extract Transform and Load (ETL) processes. Anzo employs open W3C standards such as Web Ontology Language (OWL), Resource Description Framework (RDF), and SPARQL Protocol and RDF Query Language (SPARQL) to model and connect system and ingested data as interconnected graphs. For more information, see Onboarding Data.

Onboarded Dataset Catalog

The Dataset catalog is an inventory of all of the data in Anzo. Anzo catalogs data using metadata management and graph models that describe data at a business level. Users can browse for data sets and their context in a secure and governed environment. Organizations can augment catalogs with custom metadata structures such as indications of data quality and usefulness, privacy, applicable regulations, and ownership. Anzo captures the maximum amount of available context from incoming data as well as upstream data schemas. Collaborative catalog features promote the exchange of information about data sets, mappings, models, and analytics.

Data Model

Data models or ontologies are descriptions of the structure and type of data that a data set contains; they describe the concepts, attributes, and relationships in the data. Anzo solutions link data to models so that users can search, report, and visualize the data in Hi-Res Analytics dashboards and third-party applications. Using models provides flexibility to capture data coming from various sources and structures.

For more information, see Model Terminology.

Mapping

Mappings are the building blocks of ETL jobs and describe the relationship between schemas and models. Mappings capture details about the data and transformations between data fields. For more information about mappings, see Working with Mappings.

AnzoGraph

AnzoGraph is a massively parallel processing (MPP) native graph database designed for advanced business analytics. Its compressed in-memory data storage and MPP design provides extremely fast data loading, real-time updates, and interactive analytics on huge amounts of data. AnzoGraph scales with your needs by distributing graph data across cluster nodes and processing queries in parallel on all nodes. Because of AnzoGraph's MPP and fast intra-cluster network implementation, load and query performance increases as the data and cluster size grow.

Graphmarts

Graphmarts are collections of data sets that are grouped and deployed to AnzoGraph for exploration and analysis. Graph models offer inherent flexibility that pushes the modeling, integration, and analytics decisions closer to the end-user, reducing time-to-value in dynamic business environments. For more information, see Blending Data.

Data Layers

Anzo data layers enable users to enhance graphmarts dynamically. Users can create layers to load additional data sets, clean, conform, or transform data, or infer new information from data. Anzo provides access control for data layers, and Hi-Res Analytics users can easily turn the data layers on or off in dashboards. For more information, see Working with Data Layers.

Data Toolkit

Anzo's Data Toolkit is a SPARQL service that enables users to query data from external endpoints that are accessible over JDBC or HTTP. The toolkit service can query data in databases, CSV, XML, and JSON files, or APIs. Information from the external sources can augment previously ingested data in Anzo. For more information, see Using the Data Toolkit to Include External Data in a Data Layer.

Provenance

Anzo retains and displays the provenance of all ingested structured data. The provenance explorer provides an overview of the relationships across various sources and models. Users can search for data entities and view associated pipelines, data sources, models, and schemas. You can also view the provenance of ingested unstructured data using Hi-Res Analytics. Validating data provenance is indispensable in cases where text analytics inform business-critical decisions. For more information, see Exploring Data Provenance.

Activity Log

Anzo offers a high-level view of its operations through the Activity Log. Viewing the Activity Log enables you to profile individual activities, such as Spark or AnzoGraph usage. The ability to filter activities based on their status and type enables users to drill down to activities of interest as well as profile user behavior. Additionally, the ability to include system data expands the scope of monitoring and aids in troubleshooting.

Hi-Res Analytics

Anzo Hi-Res Analytics enable users to explore and ask questions across all of their data. Using model-guided dashboards, users can perform complex computations across multi-dimensional data using over 90 Microsoft Excel-inspired formulas. Anzo's Hi-Res Analytics dashboards generate complex graph queries dynamically based on user input. For more information, see Working with Hi-Res Analytics.

Data on Demand

In line with its open standard architecture, Anzo graphmarts can be accessed using modern Application Program Interfaces (APIs). In addition to using SPARQL-compliant query endpoints, Anzo offers standards-compliant Open Data Protocol (OData)-based REST data feed endpoints as part of its data on demand service. For more information, see Accessing Data from BI Applications.