Model Normalization Options

To give users control over the labels and URIs that are generated in the data model, the GDI offers several options for normalizing the class and property fields that are created from the specified data source(s). Normalization rules can be specified at the source level to normalize the data from each source independently, or they can be used at the RDF Generator level to apply global rules across all specified data sources.

Normalization rules are applied only at the model level. The rules to do affect the instance data values that are ingested.

Including the normalize parameter is optional. If you include it, you can specify any combination of rules. See Default Normalization Behavior below for details about the Generator's default behavior when normalization rules are not specified in your query.

Default Normalization Behavior

The GDI Generator normalizes data according to the following rules by default. If you do not include the s:normalize parameter in your query, these are the rules that are applied:

s:normalize [
  s:all [
      s:removePrefix true ;
      s:removePartialPrefix false ;
      s:allowWhiteSpace false ;
      s:allowPunctuation false ;
      s:allowSymbols false ;
      s:separator " " ;
      s:singularize false ;
      s:casing s:UpperCamel ;
      s:localNameSeparator "." ;
  ]
]

Normalize Syntax

s:normalize boolean | [
  s:model | s:field | s:all
  [
    s:removeStart "string" ;
    s:removeEnd "string" ;
    s:removePrefix boolean ;
    s:removePartialPrefix boolean ;
    s:match [ s:pattern "java_regex" ; s:replace "java_regex" ] ;
    s:disambiguationLevel int ; 
    s:ignore "string" ;
    s:words "string" ;
    s:preserve "string" ;
    s:split "string" ;
    s:allowWhiteSpace boolean ;
    s:allowPunctuation boolean ;
    s:allowSymbols boolean ;
    s:singularize boolean ;
    s:casing property ;
    s:separator "string" ;
    s:localNamePrefix "string" ;
    s:localNameSeparator "string" ; 
  ] ;
] ;
Property Type Description
boolean N/A Normalize is enabled by default for all GDI Generator queries. If you want to disable normalization, you can include s:normalize false. If normalization is disabled, the names in the source will be used verbatim both for labeling and in generating the local names for property and class URIs. However, when normalization is disabled, the labels in the data source are used verbatim. In addition, the Generator creates hard-to-read, URL-encoded local names for property and class URIs.
s:model | s:field | s:all N/A This property defines whether the specified normalization rules should be applied across the model or only to the classes or properties. The list below describes each option:
  • s:model: Applies the rules to the file/table/class names only.
  • s:fields: Applies the rules to the column/property/field names only.
  • s:all: (Default) Applies the rules to both the class and property names. This is the default value if not specified.
removeStart string If you want to remove text from the beginning of identifiers, include the removeStart rule to specify the string to remove. For example, s:removeStart "temp_".
removeEnd string If you want to remove text from the end of identifiers, include the removeEnd rule to specify the string to remove. For example, s:removeEnd "NEW".
removePrefix boolean If there are property identifiers that share a prefix with the class, the RDF Generator automatically removes the shared prefix from the property name; the removePrefix rule is set to true by default. For example, if there is an EMPLOYEE class with an EMPLOYEE_ID column, the shared prefix "EMPLOYEE" is removed from the generated property so that it becomes "ID." If you do not want the Generator to remove prefixes, you can include s:removePrefix false.
removePartialPrefix boolean If there are property identifiers that share a partial prefix with the class, you can enable removePartialPrefix to remove the partial prefix from the property name. The removePartialPrefix rule is set to false by default. If you want the Generator to remove partial prefixes, you can include s:removePrefix true.
match RDF list This rule provides a way to use regular expressions (REGEX) to match a pattern against source identifiers and replace the matched text in the normalized name.

The s:pattern property defines the Java REGEX pattern to match against, and s:replace defines the Java REGEX replacement pattern. As shown in the example below, the match rule can also be configured with an rdf:List of objects to perform match evaluation in a certain order:

s:match (
[ s:pattern "(.+)GUID$" ; s:replace "$1" ; ]
[ s:pattern "(.+)ID$" ; s:replace "$1" ; ] )

disambiguationLevel int This rule specifies the number of levels to use to resolve ambiguities between similarly named elements in a hierarchical source. For example, an element named "Data" appears in two contexts: "Currently" and "Hourly." By default, the Generator retains all levels, meaning two classes are generated: "Currently Data" and "Hourly Data." If s:disambiguationLevel is set to 0, a single class named "Data" is generated and both the Currently and Hourly classes have a "Data" property. The disambiguationLevel value is also used to determine the number of hierarchy levels to use when encoding the local name of the generated URI.
ignore string This rule can be used to list identifiers that should be ignored. Properties and classes will not be generated for identifiers that match the specified string(s). The ignore rule is a multi-valued property. For simplicity, you can enter a list by separating words with a space, rather than quoting each term and separating them with a comma. For multi-word identifiers, use single quotes. For example, s:ignore "sample example 'test column' old".
words string Since many sources do not encode word boundaries very well, the words rule can be used to list the set of words that should be separate identifiers. This rule tells the Generator which words may be encountered. The words rule is a multi-valued property. For simplicity, you can enter a list by separating words with a space, rather than quoting each term and separating them with a comma. For multi-word identifiers, use single quotes. For example:

s:words "activity 'patient complaint' medication observation patient signal specialty study" ;

preserve string This rule can be used to identify any words whose casing should be preserved in the input identifiers. For example, if casing is set to lower but you want preserve the original upper casing of certain words, you can specify the words to preserve. The preserve rule is a multi-valued property. For simplicity, you can enter a list by separating words with a space, rather than quoting each term and separating them with a comma. For multi-word identifiers, use single quotes. For example: s:preserve "ABC 'Laundry List' TriG". The preserve rule is case-insensitive. You do not have to match the casing of the words to preserve.
split string This rule specifies the string that should be used to split source identifiers into individual terms. If neither split nor words is specified, input identifiers are split on casing changes and character class changes.
allowWhiteSpace boolean This rule specifies whether or not white space should be preserved in identifiers after they have been split into individual terms. This rule is set to false by default, meaning white space is not preserved. You can specify s:allowWhiteSpace true to preserve spaces.
allowPunctuation boolean This rule specifies whether or not punctuation should be preserved in identifiers after they have been split into individual terms. This rule is set to false by default, meaning punctuation is not preserved. You can specify s:allowPunctuation true to preserve punctuation.
allowSymbols boolean This rule specifies whether or not symbols should be preserved in identifiers after they have been split into individual terms. This rule is set to false by default, meaning symbols are not preserved. You can specify s:allowSymbols true to preserve symbols.
singularize boolean This rule specifies whether or not to change any plural identifiers to singular. This rule is set to false by default, meaning plural identifiers are preserved. You can specify s:singularize true to change plural terms to the singular version of the term.
casing object This rule specifies how the generated labels should be cased. By default, the Generator outputs labels in upper camel case (s:casing s:UpperCamel). To use a different casing, specify any of the following properties:
  • default: This object preserves the casing from the source. Labels will not be converted.
  • UPPER: This object converts all characters to uppercase. For example, "uppercase" becomes "UPPERCASE."
  • lower: This object converts all characters to lowercase. For example, "Lower Case" becomes "lower case".
  • UpperCamel: This is the default casing value and converts labels to upper camel case, where terms are concatenated and the first letter of each word is capitalized. For example, "upper camel case" becomes "UpperCamelCase."
  • lowerCamel: This object converts labels to lower camel case, where terms are concatenated and the first letter of the first word is lower case. The first letter of subsequent terms is capitalized. For example, "lower camel case" becomes "lowerCamelCase."
separator string This rule specifies the character or characters to use to separate terms in the generated label. The default separator is a space (s:separator " ").
localNamePrefix string This rule specifies a string to use as the prefix for local names when generating a URI.
localNameSeparator string This rule specifies the string to use for separating local names when encoding hierarchies according to the specified disambiguationLevel. By default, localNameSeparator is a period (s:localNameSeparator "."). If localNameSeparator is empty, hierarchical context will not be encoded into the local name of any properties or child classes. The result would be an ontology where only the class or property name is used to determine the local name. For example, a property URI would look like ont:employeeID rather than ont:Employee.employeeID. The result could lead to "conflicts" in the generated ontology, but those "conflicts" may be desired as properties with same name are reused across the generated ontology.

You can specify normalization rules at both the source and global level in the same query. If you include multi-valued rules (such as ignore, words, or preserve) at both levels, the Generator combines all values from both instances of the rule. If you specify single value rules at both levels and the values are conflicting, the Generator applies the value at the source level.

Normalize Examples

The example below uses the normalize property to normalize data at both the model and field level.

s:normalize [ 
    s:model [
       s:localNamePrefix "C_" ;
       s:localNameSeparator "_" ;
       s:match [ s:pattern "(.+)Enlarged" ; s:replace "$1" ] ;
    ] ;
    s:field [
       s:localNamePrefix "P_" ;
       s:localNameSeparator "_" ;
       s:ignore "rowguid ModifiedDate" ;
       s:match (
         [ s:pattern "(.+)GUID$" ; s:replace "$1" ]
         [ s:pattern "(.+)ID$" ; s:replace "$1" ]
       ) ;
    ] ;
] ;