Anzo Features and Benefits
This topic provides information about key features, concepts, and vocabulary to know when working with Anzo. For more conceptual information about Anzo processes and usage, see Anzo Concepts and the User Interface in the Anzo Getting Started Guide.
This topic introduces the following features and provides references to more detailed information:
- Linked Data Set
- Data Sources
- Onboarding
- Onboarded Dataset Catalog
- Data Model
- Mapping
- AnzoGraph
- Graphmarts
- Data Layers
- Data Toolkit
- Provenance
- Activity Log
- Hi-Res Analytics
- Data on Demand
Linked Data Set
Linked data sets are a fundamental concept in Anzo. Anzo organizes all data, including system data, into linked data sets. A linked data set is associated with a data model and can be searched, discovered, shared, and protected with access control. For example, graphmarts are organized in a linked data set or registry of graphmarts, pipelines are organized in a linked data set, the Activity Log is a linked data set, data source configurations exist in a linked data set, and so on. Graph data sources, which Anzo generates as a set of Turtle files during the ETL process, are examples of file-based linked data sets.
Data Sources
Anzo supports various structured and unstructured data sources. Users can establish connections to databases, such as Microsoft, Oracle, Hadoop, Teradata, PostgreSQL, or Google databases, import data from worksheets, CSV, JSON, or XML files, or set up unstructured pipelines that process data from common file types such as Office documents, PDFs, web pages, and email messages. For more information, see Connecting to Data Sources.
Onboarding
Anzo onboards both structured and unstructured data through automated or manually created Extract Transform and Load (ETL) processes. Anzo employs open W3C standards such as Web Ontology Language (OWL), Resource Description Framework (RDF), and SPARQL Protocol and RDF Query Language (SPARQL) to model and connect system and ingested data as interconnected graphs. For more information, see Onboarding Data.
Onboarded Dataset Catalog
The Dataset catalog is an inventory of all of the data in Anzo. Anzo catalogs data using metadata management and graph models that describe data at a business level. Users can browse for data sets and their context in a secure and governed environment. Organizations can augment catalogs with custom metadata structures such as indications of data quality and usefulness, privacy, applicable regulations, and ownership. Anzo captures the maximum amount of available context from incoming data as well as upstream data schemas. Collaborative catalog features promote the exchange of information about data sets, mappings, models, and analytics.
Data Model
Data models or ontologies are descriptions of the structure and type of data that a data set contains; they describe the concepts, attributes, and relationships in the data. Anzo solutions link data to models so that users can search, report, and visualize the data in Hi-Res Analytics dashboards and third-party applications. Using models provides flexibility to capture data coming from various sources and structures.
For more information, see Model Terminology.
Mapping
Mappings are the building blocks of ETL jobs and describe the relationship between schemas and models. Mappings capture details about the data and transformations between data fields. For more information about mappings, see Working with Mappings.
AnzoGraph
AnzoGraph is a massively parallel processing (MPP) native graph database designed for advanced business analytics. Its compressed in-memory data storage and MPP design provides extremely fast data loading, real-time updates, and interactive analytics on huge amounts of data. AnzoGraph scales with your needs by distributing graph data across cluster nodes and processing queries in parallel on all nodes. Because of AnzoGraph's MPP and fast intra-cluster network implementation, load and query performance increases as the data and cluster size grow.
Graphmarts
Graphmarts are collections of data sets that are grouped and deployed to AnzoGraph for exploration and analysis. Graph models offer inherent flexibility that pushes the modeling, integration, and analytics decisions closer to the end-user, reducing time-to-value in dynamic business environments. For more information, see Blending Data.
Data Layers
Anzo data layers enable users to enhance graphmarts dynamically. Users can create layers to load additional data sets, clean, conform, or transform data, or infer new information from data. Anzo provides access control for data layers, and Hi-Res Analytics users can easily turn the data layers on or off in dashboards. For more information, see Working with Data Layers.
Data Toolkit
Anzo's Data Toolkit is a SPARQL service that enables users to query data from external endpoints that are accessible over JDBC or HTTP. The toolkit service can query data in databases, CSV, XML, and JSON files, or APIs. Information from the external sources can augment previously ingested data in Anzo. For more information, see Using the Data Toolkit to Include External Data in a Data Layer.
Provenance
Anzo retains and displays the provenance of all ingested structured data. The provenance explorer provides an overview of the relationships across various sources and models. Users can search for data entities and view associated pipelines, data sources, models, and schemas. You can also view the provenance of ingested unstructured data using Hi-Res Analytics. Validating data provenance is indispensable in cases where text analytics inform business-critical decisions. For more information, see Exploring Data Provenance.
Activity Log
Anzo offers a high-level view of its operations through the Activity Log. Viewing the Activity Log enables you to profile individual activities, such as Spark or AnzoGraph usage. The ability to filter activities based on their status and type enables users to drill down to activities of interest as well as profile user behavior. Additionally, the ability to include system data expands the scope of monitoring and aids in troubleshooting.
Hi-Res Analytics
Anzo Hi-Res Analytics enable users to explore and ask questions across all of their data. Using model-guided dashboards, users can perform complex computations across multi-dimensional data using over 90 Microsoft Excel-inspired formulas. Anzo's Hi-Res Analytics dashboards generate complex graph queries dynamically based on user input. For more information, see Working with Hi-Res Analytics.
Data on Demand
In line with its open standard architecture, Anzo graphmarts can be accessed using modern Application Program Interfaces (APIs). In addition to using SPARQL-compliant query endpoints, Anzo offers standards-compliant Open Data Protocol (OData)-based REST data feed endpoints as part of its data on demand service. For more information, see Accessing Data from BI Applications.