Model Normalization Options

To give users control over the labels and URIs that are generated in the data model, the GDI offers several options for normalizing the class and property fields that are created from the specified data source(s). Normalization rules can be specified at the source level to normalize the data from each source independently, or they can be used at the RDF Generator level to apply global rules across all specified data sources.

Normalization rules are applied only at the model level. The rules to do affect the instance data values that are ingested.

Including the normalize parameter is optional. If you include it, you can specify any combination of rules. See Default Normalization Behavior below for details about the Generator's default behavior when normalization rules are not specified in your query.

Default Normalization Behavior
Normalize Syntax
Normalize Examples

Default Normalization Behavior

The GDI Generator normalizes data according to the following rules by default. If you do not include the s:normalize parameter in your query, these are the rules that are applied:

s:normalize [
  s:all [
      s:removePrefix true ;
      s:removePartialPrefix false ;
      s:allowWhiteSpace false ;
      s:allowPunctuation false ;
      s:allowSymbols false ;
      s:separator " " ;
      s:singularize false ;
      s:casing s:UpperCamel ;
      s:localNameSeparator "." ;
  ]
]

Normalize Syntax

s:normalize boolean | [
  s:model | s:field | s:all
  [
    s:removeStart "string" ;
    s:removeEnd "string" ;
    s:removePrefix boolean ;
    s:removePartialPrefix boolean ;
    s:match [ s:pattern "regex" ; s:replace "regex" ] ;
    s:disambiguationLevel int ; 
    s:ignore "string" ;
    s:words "string" ;
    s:preserve "string" ;
    s:split "string" ;
    s:allowWhiteSpace boolean ;
    s:allowPunctuation boolean ;
    s:allowSymbols boolean ;
    s:singularize boolean ;
    s:casing property ;
    s:separator "string" ;
    s:localNamePrefix "string" ;
    s:localNameSeparator "string" ; 
  ] ;
] ;

Property	Type	Description
boolean	N/A	Normalize is enabled by default for all GDI Generator queries. If you want to disable normalization, you can include `s:normalize false`. If normalization is disabled, the names in the source will be used verbatim both for labeling and in generating the local names for property and class URIs. However, when normalization is disabled, the labels in the data source are used verbatim. In addition, the Generator creates hard-to-read, URL-encoded local names for property and class URIs.
s:model \| s:field \| s:all	N/A	This property defines whether the specified normalization rules should be applied across the model or only to the classes or properties. The list below describes each option: s:model: Applies the rules to the file/table/class names only. s:fields: Applies the rules to the column/property/field names only. s:all: (Default) Applies the rules to both the class and property names. This is the default value if not specified.
removeStart	string	If you want to remove text from the beginning of identifiers, include the removeStart rule to specify the string to remove. For example, `s:removeStart "temp_"`.
removeEnd	string	If you want to remove text from the end of identifiers, include the removeEnd rule to specify the string to remove. For example, `s:removeEnd "NEW"`.
removePrefix	boolean	If there are property identifiers that share a prefix with the class, the RDF Generator automatically removes the shared prefix from the property name; the removePrefix rule is set to `true` by default. For example, if there is an EMPLOYEE class with an EMPLOYEE_ID column, the shared prefix "EMPLOYEE" is removed from the generated property so that it becomes "ID." If you do not want the Generator to remove prefixes, you can include `s:removePrefix false`.
removePartialPrefix	boolean	If there are property identifiers that share a partial prefix with the class, you can enable removePartialPrefix to remove the partial prefix from the property name. The removePartialPrefix rule is set to `false` by default. If you want the Generator to remove partial prefixes, you can include `s:removePrefix true`.
match	RDF list	This rule provides a way to use regular expressions (REGEX) to match a pattern against source identifiers and replace the matched text in the normalized name. The `s:pattern` property defines the Java REGEX pattern to match against, and `s:replace` defines the Java REGEX replacement pattern. As shown in the example below, the match rule can also be configured with an `rdf:List` of objects to perform match evaluation in a certain order: `s:match ( [ s:pattern "(.+)GUID$" ; s:replace "$1" ; ] [ s:pattern "(.+)ID$" ; s:replace "$1" ; ] )`
disambiguationLevel	int	This rule specifies the number of levels to use to resolve ambiguities between similarly named elements in a hierarchical source. For example, an element named "Data" appears in two contexts: "Currently" and "Hourly." By default, the Generator retains all levels, meaning two classes are generated: "Currently Data" and "Hourly Data." If `s:disambiguationLevel` is set to `0`, a single class named "Data" is generated and both the Currently and Hourly classes have a "Data" property. The disambiguationLevel value is also used to determine the number of hierarchy levels to use when encoding the local name of the generated URI.
ignore	string	This rule can be used to list identifiers that should be ignored. Properties and classes will not be generated for identifiers that match the specified string(s). The ignore rule is a multi-valued property. For simplicity, you can enter a list by separating words with a space, rather than quoting each term and separating them with a comma. For multi-word identifiers, use single quotes. For example, `s:ignore "sample example 'test column' old"`.
words	string	Since many sources do not encode word boundaries very well, the words rule can be used to list the set of words that should be separate identifiers. This rule tells the Generator which words may be encountered. The words rule is a multi-valued property. For simplicity, you can enter a list by separating words with a space, rather than quoting each term and separating them with a comma. For multi-word identifiers, use single quotes. For example: `s:words "activity 'patient complaint' medication observation patient signal specialty study" ;`
preserve	string	This rule can be used to identify any words whose casing should be preserved in the input identifiers. For example, if casing is set to `lower` but you want preserve the original upper casing of certain words, you can specify the words to preserve. The preserve rule is a multi-valued property. For simplicity, you can enter a list by separating words with a space, rather than quoting each term and separating them with a comma. For multi-word identifiers, use single quotes. For example: `s:preserve "ABC 'Laundry List' TriG"`. The preserve rule is case-insensitive. You do not have to match the casing of the words to preserve.
split	string	This rule specifies the string that should be used to split source identifiers into individual terms. If neither `split` nor `words` is specified, input identifiers are split on casing changes and character class changes.
allowWhiteSpace	boolean	This rule specifies whether or not white space should be preserved in identifiers after they have been split into individual terms. This rule is set to `false` by default, meaning white space is not preserved. You can specify `s:allowWhiteSpace true` to preserve spaces.
allowPunctuation	boolean	This rule specifies whether or not punctuation should be preserved in identifiers after they have been split into individual terms. This rule is set to `false` by default, meaning punctuation is not preserved. You can specify `s:allowPunctuation true` to preserve punctuation.
allowSymbols	boolean	This rule specifies whether or not symbols should be preserved in identifiers after they have been split into individual terms. This rule is set to `false` by default, meaning symbols are not preserved. You can specify `s:allowSymbols true` to preserve symbols.
singularize	boolean	This rule specifies whether or not to change any plural identifiers to singular. This rule is set to `false` by default, meaning plural identifiers are preserved. You can specify `s:singularize true` to change plural terms to the singular version of the term.
casing	object	This rule specifies how the generated labels should be cased. By default, the Generator outputs labels in upper camel case (`s:casing s:UpperCamel`). To use a different casing, specify any of the following properties: default: This object preserves the casing from the source. Labels will not be converted. UPPER: This object converts all characters to uppercase. For example, "uppercase" becomes "UPPERCASE." lower: This object converts all characters to lowercase. For example, "Lower Case" becomes "lower case". UpperCamel: This is the default casing value and converts labels to upper camel case, where terms are concatenated and the first letter of each word is capitalized. For example, "upper camel case" becomes "UpperCamelCase." lowerCamel: This object converts labels to lower camel case, where terms are concatenated and the first letter of the first word is lower case. The first letter of subsequent terms is capitalized. For example, "lower camel case" becomes "lowerCamelCase."
separator	string	This rule specifies the character or characters to use to separate terms in the generated label. The default separator is a space (`s:separator " "`).
localNamePrefix	string	This rule specifies a string to use as the prefix for local names when generating a URI.
localNameSeparator	string	This rule specifies the string to use for separating local names when encoding hierarchies according to the specified disambiguationLevel. By default, localNameSeparator is a period (`s:localNameSeparator "."`). If localNameSeparator is empty, hierarchical context will not be encoded into the local name of any properties or child classes. The result would be an ontology where only the class or property name is used to determine the local name. For example, a property URI would look like `ont:employeeID` rather than `ont:Employee.employeeID`. The result could lead to "conflicts" in the generated ontology, but those "conflicts" may be desired as properties with same name are reused across the generated ontology.

You can also specify normalization rules at both the source and global level in the same query. If you include multi-valued rules (such as ignore, words, or preserve) at both levels, the Generator combines all values from both instances of the rule. If you specify single value rules at both levels and the values are conflicting, the Generator applies the value at the source level.

Normalize Examples

The example below uses the normalize property to normalize data at both the model and field level.

s:normalize [ 
    s:model [
       s:localNamePrefix "C_" ;
       s:localNameSeparator "_" ;
       s:match [ s:pattern "(.+)Enlarged" ; s:replace "$1" ] ;
    ] ;
    s:field [
       s:localNamePrefix "P_" ;
       s:localNameSeparator "_" ;
       s:ignore "rowguid ModifiedDate" ;
       s:match (
         [ s:pattern "(.+)GUID$" ; s:replace "$1" ]
         [ s:pattern "(.+)ID$" ; s:replace "$1" ]
       ) ;
    ] ;
] ;