Model Normalization Options
To give users control over the labels and URIs that are generated in the data model, the GDI offers several options for normalizing the class and property fields that are created from the specified data source(s). Normalization rules can be specified at the source level to normalize the data from each source independently, or they can be used at the RDF Generator level to apply global rules across all specified data sources.
Normalization rules are applied only at the model level. The rules to do affect the instance data values that are ingested.
Including the normalize parameter is optional. If you include it, you can specify any combination of rules. See Default Normalization Behavior below for details about the Generator's default behavior when normalization rules are not specified in your query.
Default Normalization Behavior
The GDI Generator normalizes data according to the following rules by default. If you do not include the s:normalize
parameter in your query, these are the rules that are applied:
s:normalize [ s:all [ s:removePrefix true ; s:removePartialPrefix false ; s:allowWhiteSpace false ; s:allowPunctuation false ; s:allowSymbols false ; s:separator " " ; s:singularize false ; s:casing s:UpperCamel ; s:localNameSeparator "." ; ] ]
Normalize Syntax
s:normalize boolean | [ s:model | s:field | s:all [ s:removeStart "string" ; s:removeEnd "string" ; s:removePrefix boolean ; s:removePartialPrefix boolean ; s:match [ s:pattern "java_regex" ; s:replace "java_regex" ] ; s:disambiguationLevel int ; s:ignore "string" ; s:words "string" ; s:preserve "string" ; s:split "string" ; s:allowWhiteSpace boolean ; s:allowPunctuation boolean ; s:allowSymbols boolean ; s:singularize boolean ; s:casing property ; s:separator "string" ; s:localNamePrefix "string" ; s:localNameSeparator "string" ; ] ; ] ;
Property | Type | Description |
---|---|---|
boolean | N/A | Normalize is enabled by default for all GDI Generator queries. If you want to disable normalization, you can include s:normalize false . If normalization is disabled, the names in the source will be used verbatim both for labeling and in generating the local names for property and class URIs. However, when normalization is disabled, the labels in the data source are used verbatim. In addition, the Generator creates hard-to-read, URL-encoded local names for property and class URIs. |
s:model | s:field | s:all | N/A | This property defines whether the specified normalization rules should be applied across the model or only to the classes or properties. The list below describes each option:
|
removeStart | string | If you want to remove text from the beginning of identifiers, include the removeStart rule to specify the string to remove. For example, s:removeStart "temp_" . |
removeEnd | string | If you want to remove text from the end of identifiers, include the removeEnd rule to specify the string to remove. For example, s:removeEnd "NEW" . |
removePrefix | boolean | If there are property identifiers that share a prefix with the class, the RDF Generator automatically removes the shared prefix from the property name; the removePrefix rule is set to true by default. For example, if there is an EMPLOYEE class with an EMPLOYEE_ID column, the shared prefix "EMPLOYEE" is removed from the generated property so that it becomes "ID." If you do not want the Generator to remove prefixes, you can include s:removePrefix false . |
removePartialPrefix | boolean | If there are property identifiers that share a partial prefix with the class, you can enable removePartialPrefix to remove the partial prefix from the property name. The removePartialPrefix rule is set to false by default. If you want the Generator to remove partial prefixes, you can include s:removePrefix true . |
match | RDF list | This rule provides a way to use regular expressions (REGEX) to match a pattern against source identifiers and replace the matched text in the normalized name. The
|
disambiguationLevel | int | This rule specifies the number of levels to use to resolve ambiguities between similarly named elements in a hierarchical source. For example, an element named "Data" appears in two contexts: "Currently" and "Hourly." By default, the Generator retains all levels, meaning two classes are generated: "Currently Data" and "Hourly Data." If s:disambiguationLevel is set to 0 , a single class named "Data" is generated and both the Currently and Hourly classes have a "Data" property. The disambiguationLevel value is also used to determine the number of hierarchy levels to use when encoding the local name of the generated URI. |
ignore | string | This rule can be used to list identifiers that should be ignored. Properties and classes will not be generated for identifiers that match the specified string(s). The ignore rule is a multi-valued property. For simplicity, you can enter a list by separating words with a space, rather than quoting each term and separating them with a comma. For multi-word identifiers, use single quotes. For example, s:ignore "sample example 'test column' old" . |
words | string | Since many sources do not encode word boundaries very well, the words rule can be used to list the set of words that should be separate identifiers. This rule tells the Generator which words may be encountered. The words rule is a multi-valued property. For simplicity, you can enter a list by separating words with a space, rather than quoting each term and separating them with a comma. For multi-word identifiers, use single quotes. For example:
|
preserve | string | This rule can be used to identify any words whose casing should be preserved in the input identifiers. For example, if casing is set to lower but you want preserve the original upper casing of certain words, you can specify the words to preserve. The preserve rule is a multi-valued property. For simplicity, you can enter a list by separating words with a space, rather than quoting each term and separating them with a comma. For multi-word identifiers, use single quotes. For example: s:preserve "ABC 'Laundry List' TriG" . The preserve rule is case-insensitive. You do not have to match the casing of the words to preserve. |
split | string | This rule specifies the string that should be used to split source identifiers into individual terms. If neither split nor words is specified, input identifiers are split on casing changes and character class changes. |
allowWhiteSpace | boolean | This rule specifies whether or not white space should be preserved in identifiers after they have been split into individual terms. This rule is set to false by default, meaning white space is not preserved. You can specify s:allowWhiteSpace true to preserve spaces. |
allowPunctuation | boolean | This rule specifies whether or not punctuation should be preserved in identifiers after they have been split into individual terms. This rule is set to false by default, meaning punctuation is not preserved. You can specify s:allowPunctuation true to preserve punctuation. |
allowSymbols | boolean | This rule specifies whether or not symbols should be preserved in identifiers after they have been split into individual terms. This rule is set to false by default, meaning symbols are not preserved. You can specify s:allowSymbols true to preserve symbols. |
singularize | boolean | This rule specifies whether or not to change any plural identifiers to singular. This rule is set to false by default, meaning plural identifiers are preserved. You can specify s:singularize true to change plural terms to the singular version of the term. |
casing | object | This rule specifies how the generated labels should be cased. By default, the Generator outputs labels in upper camel case (s:casing s:UpperCamel ). To use a different casing, specify any of the following properties:
|
separator | string | This rule specifies the character or characters to use to separate terms in the generated label. The default separator is a space (s:separator " " ). |
localNamePrefix | string | This rule specifies a string to use as the prefix for local names when generating a URI. |
localNameSeparator | string | This rule specifies the string to use for separating local names when encoding hierarchies according to the specified disambiguationLevel. By default, localNameSeparator is a period (s:localNameSeparator "." ). If localNameSeparator is empty, hierarchical context will not be encoded into the local name of any properties or child classes. The result would be an ontology where only the class or property name is used to determine the local name. For example, a property URI would look like ont:employeeID rather than ont:Employee.employeeID . The result could lead to "conflicts" in the generated ontology, but those "conflicts" may be desired as properties with same name are reused across the generated ontology. |
You can specify normalization rules at both the source and global level in the same query. If you include multi-valued rules (such as ignore
, words
, or preserve
) at both levels, the Generator combines all values from both instances of the rule. If you specify single value rules at both levels and the values are conflicting, the Generator applies the value at the source level.
Normalize Examples
The example below uses the normalize property to normalize data at both the model and field level.
s:normalize [ s:model [ s:localNamePrefix "C_" ; s:localNameSeparator "_" ; s:match [ s:pattern "(.+)Enlarged" ; s:replace "$1" ] ; ] ; s:field [ s:localNamePrefix "P_" ; s:localNameSeparator "_" ; s:ignore "rowguid ModifiedDate" ; s:match ( [ s:pattern "(.+)GUID$" ; s:replace "$1" ] [ s:pattern "(.+)ID$" ; s:replace "$1" ] ) ; ] ; ] ;