Model Requirements and Recommendations

Anzo uses models to describe and manage RDF data sets. To ensure that data structures are properly defined, Anzo requires that data models include certain information and avoid unsupported information. This topic provides details about the requirements and guidelines to follow when uploading or creating models.

Requirements

This section lists the requirements or rules to follow when uploading or creating a data model. Models that are generated by Anzo during the auto-ingest process conform to these rules.

Define each model as an owl:Ontology

Define each data model as an owl:Ontology. To do so, include the following triple in the model:

<myOntology> a owl:Ontology

Where myOntology is the URI that names the model. The URI must be unique. To avoid unexpected results when saving a model, do not include a hash (#) character at the end of the model URI.

Define the model name with rdfs:label

Use an rdfs:label property to define name of the model as a string. Include the following triple:

<myOntology> rdfs:label "My Ontology"^^xsd:string .

For example, you can use the following statement as a template for inserting owl:Ontology and rdfs:label into the model:

<myOntology> a owl:Ontology ;
  rdfs:label "My ontology"^^xsd:string .

The named graph URI must match the ontology URI

Make sure that the named graph URI for the model matches the ontology URI. For example:

<myOntology> { <myOntology> a owl:Ontology . }

Define classes and concepts with owl:Class

Use owl:Class for class or concept definitions. Do NOT include skos:Concept or rdfs:Class. For example, the following statement requires modification to make it valid in an Anzo model:

<myConcept> a skos:Concept

Changing the statement as follows correctly uses owl:Class instead of skos:Concept:

<myConcept> a owl:Class ;
  rdfs:label <businessFacingClassLabel> .

Define taxonomy with rdfs:subClassOf

Use rdfs:subClassOf for taxonomy. Do NOT use skos:broader. For example, the following statement requires modification to make it valid in an Anzo model:

<childSkosConcept> skos:broader <parentSkosConcept> .

Changing the statement as follows correctly uses rdfs:subClassOf instead of skos:broader:

<childOwlClass> rdfs:subClassOf <parentOwlClass> .

Define properties as owl:DatatypeProperty or owl:ObjectProperty

Define properties using owl:DatatypeProperty or owl:ObjectProperty. For example:

<myObjectProperty> a owl:ObjectProperty .

Or

<myDataTypeProperty> a owl:DatatypeProperty .

Include rdfs:domain and rdfs:range for all properties

Define rdfs:domain and rdfs:range for all properties. For example, the following property definition is incomplete:

<myObjectProperty> a owl:ObjectProperty .

The statement below completes the definition by adding rdfs:label, rdfs:domain, and rdfs:range:

<myObjectProperty> a owl:ObjectProperty ;
  rdfs:label <businessFacingPropertyLabel> ;
  rdfs:domain <myClass> ;
  rdfs:range <myOtherClass> .

The example below shows a valid data type definition:

<myDataTypeProperty> a owl:DatatypeProperty ;
  rdfs:label <businessFacingPropertyLabel> ;
  rdfs:domain <myClass> ;
  <myDataTypeProperty> rdfs:range <literal> .

Important: When defining the property range for integer values, use xsd:int instead of xsd:integer.

Reference only Anzo-stored models

Models must be self-contained or include references only to models that are stored in Anzo.

Guidelines

This section lists additional guidelines and important information to know when working with data models in Anzo.

Property Range Guidelines

When creating or editing properties in the model editor, Anzo offers several RDF property ranges or data types to choose from. Certain types are preferred over others, however, because they are treated consistently and predictably across systems. Cambridge Semantics recommends that you specify one of the following preferred property range values:

  • Boolean: For true or false values.
  • Byte: For 1-byte integers from -128 to 127.
  • Date: For date values that follow a format such as YYYY-MM-DD.
  • Date time: For date and time values that follow a format such as YYYY-MM-DDThh:mm:ss.
  • Double: For up to 8-byte double floating point values.
  • Duration: For a duration of time expressed as a number of years, months, days, hours, minutes, and seconds in a format such as PnYnMnDTnHnMnS.
  • Float: For up to 4-byte floating point values with potential decimal places.
  • Int: For up to 4-byte integers from -2,147,483,648 to 2,147,483,647.
  • Long: For up to 8-byte integers from –9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.
  • Short: For up to 2-byte integers from -32,768 to 32,767.
  • String: For character values of varying length.
  • Time: For time values that follow a format such as hh:mm:ss.

TriG is the preferred format for models to upload

The preferred format for models that will be uploaded to Anzo is TriG (.trig) format.

Load RDFS and OWL vocabularies as graphs

Anzo loads but does not process additional vocabulary data (such as rdf:subPropertyOf, owl:sameAs, and owl:intersectionOf, etc.) if they are encoded in models. Models that contain vocabularies rather than structural information should be loaded as RDF graphs instead. Anzo can load any valid RDF data. Since RDFS, SKOS, and OWL are valid RDF formats, the vocabulary information can be loaded as a graph, and the data can be interpreted with SPARQL in data layers and Hi-Res Analytics.

Axiomatically defined classes and property hierarchies not processed

When models include axiomatically defined classes or property hierarchies, Anzo loads the information but does not process the data. For example, Anzo does not infer information from axiomatically defined classes.

Related Topics