Utilities Library

The utilities library contains several miscellaneous functions. This topic describes each of the functions.

  • LCASE: Converts the letters in a string literal to lower case based on the given locale.
  • UCASE: Converts the letters in a string literal to upper case based on the given locale.
  • bitap_fuzzy: Performs fuzzy string matching using the Bitap algorithm.
  • cpp::fuzzy_match: Compares the given string to the specified pattern and returns a score.
  • cpp::levenshtein_dist: Calculates the Levenshtein distance between two strings.
  • damerauLevenshteinDistance: Calculates the Damerau-Levenshtein distance between two strings.
  • maskFirstNChars: Masks the beginning N characters with asterisks (*).
  • maskLastNChars: Masks the last N characters with asterisks (*).
  • regex: Creates a JSON string with all of the matches for the specified regular expression.

The URI for the utilities is <http://cambridgesemantics.com/anzograph/utilities#>. For readability, the syntax for each function below includes the prefix util:, defined as PREFIX util: <http://cambridgesemantics.com/anzograph/utilities#>.

LCASE

This function converts the letters in a string literal to lower case according to the rules of the specified locale.

Syntax

util:LCASE(text, locale)
Argument Type Description
text string The string literal to convert to lower case.
locale string The locale to use for the conversion.

Returns

Type Description
string The string with lower case characters.

UCASE

This function converts all letters in a string to upper case according to the rules of the specified locale.

Syntax

util:UPPER(text, locale)
Argument Type Description
text string The string value to convert to upper case.
locale string The locale to use for the conversion.

Returns

Type Description
string The string with upper case characters.

bitap_fuzzy

This function performs fuzzy string matching using the Bitap algorithm. The function evaluates whether the specified text contains a string that is approximately equal to the given pattern, where approximate equality is determined in terms of Hamming distance.

Syntax

util:bitap_fuzzy(pattern, text, k) 
Argument Type Description
pattern string The pattern to match the text against.
text string The string to match the pattern against.
k int The number of errors that are allowed (the Hamming distance of k).

Returns

Type Description
int The first match's starting index in the text. 0 means starting position, and -1 means no match.

cpp::fuzzy_match

This function is modeled after Sublime Text's fuzzy matching and compares the given string to the specified pattern and returns a score.

Syntax

util:cpp::fuzzy_match(pattern, string) 
Argument Type Description
pattern string The pattern to match the string against.
string string The string to match the pattern against.

Returns

Type Description
int The matched score. The score is returned only for matching strings. If there is no match, the score is -9999.

Example

The following example queries the Tickit data set to find the number of city names that are a fuzzy match to the specified VALUES.

PREFIX util: <http://cambridgesemantics.com/anzograph/utilities#>
PREFIX tickit: <http://anzograph.com/tickit/>
SELECT (count(*) as ?totalMatches)
FROM <http://anzograph.com/tickit>
WHERE { ?venueid tickit:venuecity ?city . VALUES (?to_match) { ("Denver") ("Seattle") ("East") ("Toronto") } BIND(util:cpp::fuzzy_match(?city, ?to_match) as ?matched) FILTER(?matched > -9999) }
totalMatches
--------------
10
1 rows

cpp::levenshtein_dist

This function calculates the Levenshtein distance or measure of similarity between two strings. The distance is the smallest number of insertions, deletions, and/or substitutions required to transform the first string into the second string.

Syntax

util:cpp::levenshtein_dist(string1, string2) 
Argument Type Description
string1 string The string that would be transformed into string2.
string2 string The string to measure string1 against.

Returns

Type Description
int The Levenshtein distance between the strings.

Example

The following example queries the Tickit data set to find cities whose names have a levenshtein distance that is not equal to 0 and is less than or equal to 5 when compared with the values "Denver," "Seattle," or "East."

PREFIX util: <http://cambridgesemantics.com/anzograph/utilities#>
PREFIX tickit: <http://anzograph.com/tickit/>
SELECT DISTINCT ?city ?dist
FROM <http://anzograph.com/tickit>
WHERE {
  ?venueid tickit:venuecity ?city .
  VALUES (?to_match) {
    ("Denver") ("Seattle") ("East")
  }
  BIND(util:cpp::levenshtein_dist(?city, ?to_match) as ?dist)
  FILTER(?dist != 0 && ?dist <= 5)
}
ORDER BY ?city
city      | dist
----------+------
Atlanta   |    5
Boston    |    4
Carson    |    4
Dallas    |    5
Dayton    |    4
Dayton    |    5
Detroit   |    5
Frisco    |    5
Glendale  |    5
Hershey   |    5
Houston   |    5
Landover  |    4
Miami     |    4
Newark    |    5
Ottawa    |    5
Saratoga  |    5
Seattle   |    5
Sunrise   |    5
Tampa     |    4
Vancouver |    5
20 rows

damerauLevenshteinDistance

This function calculates the Damerau-Levenshtein distance or measure of similarity between two strings. The distance is the smallest number of insertions, deletions, character transpositions, and/or substitutions required to transform the first string into the second string.

Syntax

util:damerauLevenshteinDistance(string1, string2) 
Argument Type Description
string1 string The string that would be transformed into string2.
string2 string The string to measure string1 against.

Returns

Type Description
int The Damerau-Levenshtein distance between the strings.

maskFirstNChars

This function masks the beginning N characters with an asterisk (*).

Syntax

util:maskFirstNChars(string, number_of_chars) 
Argument Type Description
string string The string to mask.
number_of_chars int The number of characters to mask from the beginning of the string.

Returns

Type Description
string The string with the masked characters.

maskLastNChars

This function masks the last N characters with an asterisk (*).

Syntax

util:maskLastNChars(string, number_of_chars) 
Argument Type Description
string string The string to mask.
number_of_chars int The number of characters to mask from the end of the string.

Returns

Type Description
string The string with the masked characters.

regex

This function creates a JSON string that includes all of the matches for the specified regular expression.

Syntax

util:regex(string, expression) 
Argument Type Description
string string The string to match against the regular expression.
expression string The regular expression in ECMAScript grammar.

Returns

Type Description
JSON string The JSON string with all of the regular expression matches with index "0" as the whole targeted string.