Utilities Library
The utilities library contains several miscellaneous functions. This topic describes each of the functions.
- LCASE: Converts the letters in a string literal to lower case based on the given locale.
- UCASE: Converts the letters in a string literal to upper case based on the given locale.
- bitap_fuzzy: Performs fuzzy string matching using the Bitap algorithm.
- cpp::fuzzy_match: Compares the given string to the specified pattern and returns a score.
- cpp::levenshtein_dist: Calculates the Levenshtein distance between two strings.
- damerauLevenshteinDistance: Calculates the Damerau-Levenshtein distance between two strings.
- maskFirstNChars: Masks the beginning N characters with asterisks (*).
- maskLastNChars: Masks the last N characters with asterisks (*).
- regex: Creates a JSON string with all of the matches for the specified regular expression.
The URI for the utilities is <http://cambridgesemantics.com/anzograph/utilities#>
. For readability, the syntax for each function below includes the prefix util:
, defined as PREFIX util: <http://cambridgesemantics.com/anzograph/utilities#>
.
LCASE
This function converts the letters in a string literal to lower case according to the rules of the specified locale.
Syntax
util:LCASE(text, locale)
text
|
string |
The string literal to convert to lower case. |
locale
|
string |
The locale to use for the conversion. |
Returns
string |
The string with lower case characters. |
UCASE
This function converts all letters in a string to upper case according to the rules of the specified locale.
Syntax
util:UPPER(text, locale)
text
|
string |
The string value to convert to upper case. |
locale
|
string |
The locale to use for the conversion. |
Returns
string |
The string with upper case characters. |
bitap_fuzzy
This function performs fuzzy string matching using the Bitap algorithm. The function evaluates whether the specified text contains a string that is approximately equal to the given pattern, where approximate equality is determined in terms of Hamming distance.
Syntax
util:bitap_fuzzy(pattern, text, k)
pattern
|
string |
The pattern to match the text against. |
text
|
string |
The string to match the pattern against. |
k
|
int |
The number of errors that are allowed (the Hamming distance of k). |
Returns
int |
The first match's starting index in the text. 0 means starting position, and -1 means no match. |
cpp::fuzzy_match
This function is modeled after Sublime Text's fuzzy matching and compares the given string to the specified pattern and returns a score.
Syntax
util:cpp::fuzzy_match(pattern, string)
pattern
|
string |
The pattern to match the string against. |
string
|
string |
The string to match the pattern against. |
Returns
int |
The matched score. The score is returned only for matching strings. If there is no match, the score is -9999 . |
Example
The following example queries the Tickit data set to find the number of city names that are a fuzzy match to the specified VALUES.
PREFIX util: <http://cambridgesemantics.com/anzograph/utilities#>
PREFIX tickit: <http://anzograph.com/tickit/>
SELECT (count(*) as ?totalMatches)
FROM <http://anzograph.com/tickit>
WHERE {
?venueid tickit:venuecity ?city .
VALUES (?to_match) {
("Denver") ("Seattle") ("East") ("Toronto")
}
BIND(util:cpp::fuzzy_match(?city, ?to_match) as ?matched)
FILTER(?matched > -9999)
}
totalMatches
--------------
10
1 rows
cpp::levenshtein_dist
This function calculates the Levenshtein distance or measure of similarity between two strings. The distance is the smallest number of insertions, deletions, and/or substitutions required to transform the first string into the second string.
Syntax
util:cpp::levenshtein_dist(string1, string2)
string1
|
string |
The string that would be transformed into string2 . |
string2
|
string |
The string to measure string1 against. |
Returns
int |
The Levenshtein distance between the strings. |
Example
The following example queries the Tickit data set to find cities whose names have a levenshtein distance that is not equal to 0 and is less than or equal to 5 when compared with the values "Denver," "Seattle," or "East."
PREFIX util: <http://cambridgesemantics.com/anzograph/utilities#>
PREFIX tickit: <http://anzograph.com/tickit/>
SELECT DISTINCT ?city ?dist
FROM <http://anzograph.com/tickit>
WHERE {
?venueid tickit:venuecity ?city .
VALUES (?to_match) {
("Denver") ("Seattle") ("East")
}
BIND(util:cpp::levenshtein_dist(?city, ?to_match) as ?dist)
FILTER(?dist != 0 && ?dist <= 5)
}
ORDER BY ?city
city | dist
----------+------
Atlanta | 5
Boston | 4
Carson | 4
Dallas | 5
Dayton | 4
Dayton | 5
Detroit | 5
Frisco | 5
Glendale | 5
Hershey | 5
Houston | 5
Landover | 4
Miami | 4
Newark | 5
Ottawa | 5
Saratoga | 5
Seattle | 5
Sunrise | 5
Tampa | 4
Vancouver | 5
20 rows
damerauLevenshteinDistance
This function calculates the Damerau-Levenshtein distance or measure of similarity between two strings. The distance is the smallest number of insertions, deletions, character transpositions, and/or substitutions required to transform the first string into the second string.
Syntax
util:damerauLevenshteinDistance(string1, string2)
string1
|
string |
The string that would be transformed into string2 . |
string2
|
string |
The string to measure string1 against. |
Returns
int |
The Damerau-Levenshtein distance between the strings. |
maskFirstNChars
This function masks the beginning N characters with an asterisk (*).
Syntax
util:maskFirstNChars(string, number_of_chars)
string
|
string |
The string to mask. |
number_of_chars
|
int |
The number of characters to mask from the beginning of the string. |
Returns
string |
The string with the masked characters. |
maskLastNChars
This function masks the last N characters with an asterisk (*).
Syntax
util:maskLastNChars(string, number_of_chars)
string
|
string |
The string to mask. |
number_of_chars
|
int |
The number of characters to mask from the end of the string. |
Returns
string |
The string with the masked characters. |
regex
This function creates a JSON string that includes all of the matches for the specified regular expression.
Syntax
util:regex(string, expression)
string
|
string |
The string to match against the regular expression. |
expression
|
string |
The regular expression in ECMAScript grammar. |
Returns
JSON string |
The JSON string with all of the regular expression matches with index "0" as the whole targeted string. |