Utilities Library

The utilities library contains several miscellaneous functions. This topic describes each of the functions.

LCASE: Converts the letters in a string literal to lower case based on the given locale.
UCASE: Converts the letters in a string literal to upper case based on the given locale.
bitap_fuzzy: Performs fuzzy string matching using the Bitap algorithm.
cpp::fuzzy_match: Compares the given string to the specified pattern and returns a score.
cpp::levenshtein_dist: Calculates the Levenshtein distance between two strings.
damerauLevenshteinDistance: Calculates the Damerau-Levenshtein distance between two strings.
maskFirstNChars: Masks the beginning N characters with asterisks (*).
maskLastNChars: Masks the last N characters with asterisks (*).
regex: Creates a JSON string with all of the matches for the specified regular expression.

The URI for the utilities is <http://cambridgesemantics.com/anzograph/utilities#>. For readability, the syntax for each function below includes the prefix util:, defined as PREFIX util: <http://cambridgesemantics.com/anzograph/utilities#>.

LCASE

This function converts the letters in a string literal to lower case according to the rules of the specified locale.

Syntax

util:LCASE(text, locale)

Argument	Type	Description
text	string	The string literal to convert to lower case.
locale	string	The locale to use for the conversion.

Returns

Type	Description
string	The string with lower case characters.

UCASE

This function converts all letters in a string to upper case according to the rules of the specified locale.

Syntax

util:UPPER(text, locale)

Argument	Type	Description
text	string	The string value to convert to upper case.
locale	string	The locale to use for the conversion.

Returns

Type	Description
string	The string with upper case characters.

bitap_fuzzy

This function performs fuzzy string matching using the Bitap algorithm. The function evaluates whether the specified text contains a string that is approximately equal to the given pattern, where approximate equality is determined in terms of Hamming distance.

Syntax

util:bitap_fuzzy(pattern, text, k)

Argument	Type	Description
pattern	string	The pattern to match the `text` against.
text	string	The string to match the `pattern` against.
k	int	The number of errors that are allowed (the Hamming distance of k).

Returns

Type	Description
int	The first match's starting index in the text. `0` means starting position, and `-1` means no match.

cpp::fuzzy_match

This function is modeled after Sublime Text's fuzzy matching and compares the given string to the specified pattern and returns a score.

Syntax

util:cpp::fuzzy_match(pattern, string)

Argument	Type	Description
pattern	string	The pattern to match the `string` against.
string	string	The string to match the `pattern` against.

Returns

Type	Description
int	The matched score. The score is returned only for matching strings. If there is no match, the score is `-9999`.

Example

The following example queries the Tickit data set to find the number of city names that are a fuzzy match to the specified VALUES.

PREFIX util: <http://cambridgesemantics.com/anzograph/utilities#>
PREFIX tickit: <http://anzograph.com/tickit/>
SELECT (count(*) as ?totalMatches) 
FROM <http://anzograph.com/tickit>
WHERE {
  ?venueid tickit:venuecity ?city .
  VALUES (?to_match) { 
    ("Denver") ("Seattle") ("East") ("Toronto")
  }
  BIND(util:cpp::fuzzy_match(?city, ?to_match) as ?matched)
  FILTER(?matched > -9999)
}

totalMatches
--------------
10
1 rows

cpp::levenshtein_dist

This function calculates the Levenshtein distance or measure of similarity between two strings. The distance is the smallest number of insertions, deletions, and/or substitutions required to transform the first string into the second string.

Syntax

util:cpp::levenshtein_dist(string1, string2)

Argument	Type	Description
string1	string	The string that would be transformed into `string2`.
string2	string	The string to measure `string1` against.

Returns

Type	Description
int	The Levenshtein distance between the strings.

Example

The following example queries the Tickit data set to find cities whose names have a levenshtein distance that is not equal to 0 and is less than or equal to 5 when compared with the values "Denver," "Seattle," or "East."

PREFIX util: <http://cambridgesemantics.com/anzograph/utilities#>
PREFIX tickit: <http://anzograph.com/tickit/>
SELECT DISTINCT ?city ?dist
FROM <http://anzograph.com/tickit>
WHERE {
  ?venueid tickit:venuecity ?city .
  VALUES (?to_match) {
    ("Denver") ("Seattle") ("East")
  }
  BIND(util:cpp::levenshtein_dist(?city, ?to_match) as ?dist)
  FILTER(?dist != 0 && ?dist <= 5)
}
ORDER BY ?city

city      | dist
----------+------
Atlanta   |    5
Boston    |    4
Carson    |    4
Dallas    |    5
Dayton    |    4
Dayton    |    5
Detroit   |    5
Frisco    |    5
Glendale  |    5
Hershey   |    5
Houston   |    5
Landover  |    4
Miami     |    4
Newark    |    5
Ottawa    |    5
Saratoga  |    5
Seattle   |    5
Sunrise   |    5
Tampa     |    4
Vancouver |    5
20 rows

damerauLevenshteinDistance

This function calculates the Damerau-Levenshtein distance or measure of similarity between two strings. The distance is the smallest number of insertions, deletions, character transpositions, and/or substitutions required to transform the first string into the second string.

Syntax

util:damerauLevenshteinDistance(string1, string2)

Argument	Type	Description
string1	string	The string that would be transformed into `string2`.
string2	string	The string to measure `string1` against.

Returns

Type	Description
int	The Damerau-Levenshtein distance between the strings.

maskFirstNChars

This function masks the beginning N characters with an asterisk (*).

Syntax

util:maskFirstNChars(string, number_of_chars)

Argument	Type	Description
string	string	The string to mask.
number_of_chars	int	The number of characters to mask from the beginning of the string.

Returns

Type	Description
string	The string with the masked characters.

maskLastNChars

This function masks the last N characters with an asterisk (*).

Syntax

util:maskLastNChars(string, number_of_chars)

Argument	Type	Description
string	string	The string to mask.
number_of_chars	int	The number of characters to mask from the end of the string.

Returns

Type	Description
string	The string with the masked characters.

regex

This function creates a JSON string that includes all of the matches for the specified regular expression.

Syntax

util:regex(string, expression)

Argument	Type	Description
string	string	The string to match against the regular expression.
expression	string	The regular expression in ECMAScript grammar.

Returns

Type	Description
JSON string	The JSON string with all of the regular expression matches with index "0" as the whole targeted string.