Glossary

This topic defines commonly used Anzo terms and phrases.

Term Definition
Anzo Data Store
An Anzo data store defines an endpoint for writing data. It specifies the directory on the file store where Anzo can generate file-based linked data sets for unstructured pipelines (see File-Based Linked Data Set). It can also be used as a designated a directory to use for saving data sets created from graphmart exports. The data store also defines write properties such as the maximum file size and whether files should be compressed.
Artifacts
Artifacts are all of the objects that are created in Anzo during initial configuration and the data onboarding process. For example, when you connect to a database or file source, those connections are stored as artifacts, and when the data from a data source is ingested, the resulting schema, model, graphmart, and any generated datasets are also artifacts.
Blend
Using semantic models, separate data sets or new data sources can be blended into a knowledge graph. Anzo can combine and align any data set as well as apply data cleansing and/or transformation steps. Anzo delivers blending and access through Graphmarts, which give users the flexibility to combine and analyze any subset of data in Anzo.
Data Layers
Data layers enable users to enhance graphmarts dynamically. Users can create layers to load additional data sets, clean, conform, or transform data, infer new information, or export data to a file-based linked data set (FLDS).
Datasets Catalog
Anzo’s Dataset catalog combines traditional technical, operational, and business metadata with a semantic layer to describe all aspects of enterprise data elements. The catalog enables Anzo’s unique use of semantics and graph models and is the system of record for data in Anzo.

Anzo collects and generates metadata at every stage in the data discovery and integration process. Metadata in the catalog documents how data is converted during the onboarding process from its original format into a graph model. Subsequent data blending, transformation, and preparation steps are captured as additional metadata. Anzo also captures new metadata to describe all actions taken against data within Anzo. The metadata enables users to visualize their data, understand business contexts, identify connections, and blend and prepare data.

ELT
Structured and semi-structured data sources are onboarded using an extract, load, and transform (ELT) workflow as opposed to a traditional ETL flow. Data Layers are Anzo's mechanism for flexibly transforming data in memory.
File-Based Linked Data Set
A File-Based Linked Data Set (FLDS) is a Named Graph that contains a collection of ontologies and the location of RDF data files that share common structure, purpose, meaning, or permissions.

When the unstructured pipeline workflow is used to onboard data, Anzo creates a dataset in the Datasets catalog. The dataset in the catalog is registered in the Anzo system data source (see Journal or Volume) and includes metadata about the data, including a pointer to the data store location for the RDF files generated by the pipeline. The catalog dataset and the files on disk are an FLDS. An FLDS is also generated when an Export Step is included in a graphmart.

File Store
The Anzo platform components, AnzoGraph, Anzo Unstructured, and Elasticsearch share a file system for maintaining onboarded data and supporting files. A file store is the shared file storage system, such as NFS, HDFS, or cloud storage, that is shared between the servers.
Frame Graph
Each ontology in Anzo has a corresponding frame ontology or frame graph. A frame graph is generated when a new ontology is added, and it is regenerated each time the ontology is modified. By generating a frame graph, Anzo can pre-process the ontology rather than waiting until runtime to do the calculations. During frame graph generation Anzo performs activities like finding all of the properties available to a class, identifying the superclasses of each class, and determining whether a property is required or multi-valued.
Graph Data Interface
The Graph Data Interface (GDI) (sometimes called the Data Toolkit) is a flexible AnzoGraph extension that enables users to access a variety of data sources via SPARQL queries. The GDI has built-in, native support for various file format types, HTTP/REST endpoints, and JDBC connections to common database sources. For more information about the GDI, see Introduction to the GDI.
Graphmarts
Graphmarts are collections of knowledge graphs that users can blend and enhance. Graphmarts can combine any subset of data in Anzo for analysis. For more information about graphmarts, see Graphmart Concepts.
Hi-Res Analytics
Hi-Res Analytics enable users to explore and ask questions across all of their data. Using model-guided dashboards, users can perform computations across multi-dimensional data. Hi-Res Analytics dashboards generate complex graph queries dynamically based on user input.
IRI
An Internationalized Resource Identifier (IRI) is similar to URI but allows a greater range of characters. URI and IRI are often used interchangeably.
Journal or Volume
A journal, also known as a volume, refers to data that is stored in Anzo's embedded graph store. The graph store is transactional and is used to persist metadata, which is written to disk in a .jnl file. The system volume (or system data source) is the default, required volume where Anzo stores ontologies as well as system configuration, dataset, catalog, registry, and access control metadata. Users can create secondary local volumes that are used for more compartmentalized data and can be created and deleted without affecting the core system.
Linked Data Set
A linked data set (LDS) is a fundamental concept. Anzo organizes all data, including system data, into linked data sets. An LDS is associated with a data model and can be searched, discovered, shared, and protected with access control. For example, graphmarts are organized in a linked data set or registry of graphmarts, the Activity Log is a linked data set, data source configurations exist in a linked data set, and so on.
Model
Anzo establishes the semantic layer by enabling users to convert diverse enterprise data models into graph data models and then enhance the data by adding new business definitions, names, and tags. Further insight is added when data from separate sources are linked, connecting shared business definitions across previously siloed sources. Anzo employs open World Wide Web Consortium (W3C) standards, including Web Ontology Language (OWL), RDF, and SPARQL to model, connect, and query interconnected graphs. For more information, see Model Concepts and Vocabulary.
Managed Model
A managed model is a model that is generated by the Graph Data Interface (GDI) and owned and managed by a data layer with a Direct Load Step. Any manual changes made to a managed model outside of a Direct Load step, such as from the Model editor, will be overwritten any time the graphmart or layer is refreshed or reloaded. For more information, see Managed Model Concepts.
Named Graph
Anzo implements the RDF Named Graph abstraction. These are the atomic units of storage in Anzo. Each named graph can be access controlled, and each graph has a corresponding “metadata graph” that includes the access control information, the last modified date, and which user created and modified the associated named graph. For more information about named graph storage, see Graph Storage Concepts.
NLP
Anzo performs named-entity recognition (NER) using knowledge bases and can interface with natural language processing (NLP) tools. It serves as a platform that enables text analytics through interplay between best-of-breed NLP tools.
OData
Open Data Protocol (OData) facilitates the creation of interoperable RESTful APIs. The Anzo Data on Demand service provides OData-based feeds that can be used to query graphmart data from third-party business intelligence tools.
OSGi
The Open Service Gateway Initiative (OSGi) is the open-standard architecture upon which Anzo is built. It is a Java framework for developing and deploying software programs and libraries. OSGi enables Cambridge Semantics to compartmentalize Anzo into "bundles" that can be deployed, activated, and removed independently without affecting other bundles in the system.
Onboard
When data is ingested from its source platform to Anzo, it is converted from its original format to a new format that describes the data as a graph data model. This format, Resource Description Framework (RDF), captures each data value and relationship. RDF data is loaded to Anzo’s in-memory graph engine, AnzoGraph for transformation and analysis.
Registry
Anzo manages configurations in system-level registries. Each registry is a collection of application and system component configurations of the same type. Like data, registries are stored and managed with RDF named graphs according to ontologies. Technically, a registry is a Linked Data Set.
URI
A Uniform Resource Identifier (URI) is a globally unique identifier for a piece of information. A URL (Uniform Resource Locator) is a URI that specifies a location, such as a web address.