String Functions
This topic describes the Anzo functions that operate on string data types.
Typographical Conventions
This documentation uses the following conventions in function syntax:
CAPS
: Although SPARQL is case-insensitive, function names and other keywords are written in uppercase for readability.
[ argument ]
: Brackets are used to indicate optional arguments. Arguments without brackets are required.
Functions
- BUSINESS_ENTITY_EXCLUDER: Removes suffixes that represent business entities.
- CONCATENATE: Concatenates two or more strings and returns the result as a string.
- CONCATURL: Concatenates two or more strings and returns the result as a URI.
- CONTAINS: Evaluates whether the specified string contains the given pattern.
- ENCODE_FOR_URI: Encodes the specified string as a URI.
- ESCAPEHTML: Escapes the specified string for use in HTML.
- FIND: Returns the position—from left to right—of a string within another string.
- FINDREVERSE: Returns the position—from right to left—of a string within another string.
- GROUP_CONCAT: Concatenates a group of strings into a single string.
- GROUPCONCAT: Concatenates a group of strings into a single string. This function is a customizable version of GROUP_CONCAT.
- LANG: Returns any language tags that are included with strings.
- LANGMATCHES: Evaluates whether a string includes a language tag that matches the specified language range.
- LCASE: Converts the letters in a string literal to lower case.
- LEFT: Returns the specified number of characters starting from the beginning (left side) of the string.
- LEN: Calculates the length (number of characters) in a string.
- LEVENSHTEIN_DIST: Calculates the Levenshtein distance or measure of similarity between two strings.
- LOWER: Converts all letters in a string to lower case.
- MD5: Returns the MD5 checksum of a string as a hexadecimal string.
- MID: Returns the specified number of characters from a string, starting from a given position in the string.
- REGEX: Evaluates whether a string matches the specified regular expression pattern.
- REGEXP_SUBSTR: Searches a string for the specified regular expression pattern and returns the substring that matches the pattern.
- REPLACE: Extends the REGEX function to provide the ability to find a pattern in a string and replace it with another pattern.
- RIGHT: Returns the specified number of characters starting from the end (right side) of the string.
- SEARCH: Uses text search semantics to evaluate whether the specified string matches the given pattern.
- SHA1: Calculates the SHA-1 digest of a string value.
- SHA224: Calculates the SHA-224 digest of a string value.
- SHA256: Calculates the SHA-256 digest of a string value.
- SHA384: Calculates the SHA-384 digest of a string value.
- SHA512: Calculates the SHA-512 digest of a string value.
- STRAFTER: Returns the portion of a string that comes after the specified substring.
- STRBEFORE: Returns the portion of a string that comes before the specified substring.
- STRDT: Constructs a literal value with the specified data type.
- STRENDS: Evaluates whether the specified string ends with the specified substring.
- STRLANG: Constructs a literal value with the specified language tag.
- STRLEN: Calculates the length of a string.
- STRSTARTS: Evaluates whether the specified string starts with the specified substring.
- STRUUID: Returns a string that is the result of generating a Universally Unique Identifier (UUID).
- SUBSTITUTE: Substitutes the existing text for the specified new text.
- SUBSTR: Returns a substring from a string value.
- TOURI: Casts a string to a URI.
- TRIM: Removes all spaces from a string except for any single spaces between words.
- UCASE: Converts all letters in a string to upper case.
- UPPER: Converts the letters in a string literal to upper case.
BUSINESS_ENTITY_EXCLUDER
This function removes from strings the suffixes that represent business entities.
Syntax
BUSINESS_ENTITY_EXCLUDER(text)
text
|
string |
The string from which you want to remove business entities. |
Returns
string |
The string without the business entity suffix. |
CONCATENATE
This function concatenates two or more strings and returns the result as a string.
Syntax
CONCATENATE(text1, text2 [, textN ])
text1–N
|
string |
The strings that you want to concatenate to form a single string. |
Returns
string |
The concatenated string. |
CONCATURL
This function concatenates two or more strings and returns the result as a URI.
Syntax
CONCATURL(text1, text2 [, textN ])
text1–N
|
string |
The strings that you want to concatenate to form a URI. |
Returns
URI |
The concatenated string as a URI. |
CONTAINS
This function evaluates whether the specified strings contain the given pattern. Results are grouped under "true" or "false."
Syntax
CONTAINS(text, pattern)
text
|
string |
The string value that you want to check against the specified pattern. |
pattern
|
string |
The string pattern that you want to look for in the supplied text. |
Returns
boolean |
True if the strings contain the pattern and false if they do not. |
ENCODE_FOR_URI
This function encodes the specified string as a URI and returns a string in URI format.
Syntax
ENCODE_FOR_URI(text)
text
|
string |
The string value to encode as a URI. |
Returns
string |
The string as a URI. |
ESCAPEHTML
This function escapes the specified string for use in HTML.
Syntax
ESCAPEHTML(text)
text
|
string |
The string value to escape for HTML. |
Returns
string |
The string escaped for HTML. |
FIND
This function returns the position—from left to right—of a string within another string.
You can use FINDREVERSE to find the character or substring position from right to left.
Syntax
FIND(find_text, within_text, start_num)
find_text
|
string |
The string to look for in the within_text . |
within_text
|
string |
The string to search within. |
start_num
|
int |
An integer that indicates the position to start from when looking for the find_text . The starting position is at the beginning of the within_text value and characters are counted from left to right. |
Returns
int |
The character position (from left to right) where the substring starts. |
FINDREVERSE
Similar to FIND, this function returns the position—from right to left—of a string within another string.
Syntax
FINDREVERSE(find_text, within_text, start_num)
find_text
|
string |
The string to look for in the within_text value. |
within_text
|
string |
The string to search within. |
start_num
|
int |
An integer that indicates the position to start from when looking for the find_text . The starting position is the end of the within_text value and characters are counted from right to left. |
Returns
int |
The character position (from right to left) where the substring starts. |
GROUP_CONCAT
This function concatenates a group of strings into a single string. It is a simplified version of GROUPCONCAT as it takes only one argument.
Syntax
GROUP_CONCAT(text)
text
|
string |
The string property whose values to concatenate into a single string. |
Returns
string |
The concatenated string. |
GROUPCONCAT
This function concatenates a group of strings into a single string. Unlike GROUP_CONCAT, this function allows for customization of the separator to use as well as the configuration of limits and options like prefixes and suffixes.
Syntax
GROUPCONCAT(group1, [ group2, ..., groupN, ] group_value_separator, separator, serialize,
row_limit, value_limit, delimit_blanks [, prefix ] [, suffix ] [, max_length ])
group1–N
|
string |
The group(s) of strings to concatenate. |
group_value_separator
|
string |
The separator string to use between the groups of strings if you specified more than one group . |
separator
|
string |
The separator string to use between the values in a concatenated group of strings. |
serialize
|
boolean |
A boolean value that indicates whether returned values should be serialized with the value's data type. |
row_limit
|
int |
An integer that puts a maximum limit on the number of rows to retrieve for a group. |
value_limit
|
int |
An integer that puts a maximum limit on the number of values to retrieve from a group of rows. |
delimit_blanks
|
boolean |
A boolean value that indicates whether to delimit blanks with the separator value. |
prefix
|
string |
Optional string to add as a prefix to the resulting string. |
suffix
|
string |
Optional string to add as a suffix to the resulting string. |
max_length
|
int |
Optional integer that puts a maximum limit on the number of characters the resulting string can have. |
Returns
string |
The concatenated string. |
LANG
This function returns any language tags that are included in the string. The results are grouped by each language tag or by "blank" if a value does not have a language tag.
Syntax
LANG(text)
text
|
string |
The string to search for language tags. |
Returns
string |
The found language tags. |
LANGMATCHES
This function tests whether a string includes a language tag that matches the specified language range.
Syntax
LANGMATCHES(text, language_range)
text
|
string |
The string to evaluate. |
language_range
|
string |
The language tag to match in the text . |
Example
LANGMATCHES(LANG(?prop),"en")
Returns
boolean |
True if strings include a language tag that matches the range and false if they do not. |
LCASE
This function converts the letters in a string literal to lower case.
Syntax
LCASE(text)
text
|
string |
The string literal to convert to lower case. |
Returns
string |
The string with lower case letters. |
LEFT
This function returns the specified number of characters starting from the beginning (left side) of the string.
Syntax
LEFT(text, num_chars)
text
|
string |
The string from which to return the specified number of characters. |
num_chars
|
int |
An integer that specifies the number of characters to return, starting from the left side of the text . |
Returns
string |
The specified number of characters from the string. |
LEN
This function calculates the length (number of characters) in a string.
Syntax
LEN(text)
text
|
string |
The string for which to calculate the length. |
Returns
int |
The number of characters in the string. |
LEVENSHTEIN_DIST
This function calculates the Levenshtein distance or measure of similarity between two strings. The distance is the number of edits required to transform the first string into the second string.
Syntax
LEVENSHTEIN_DIST(text1, text2)
text1
|
string |
The string that would be transformed into text2 . |
text2
|
string |
The string to measure text1 against. |
Returns
int |
The Levenshtein distance between the strings. |
LOWER
This function converts all letters in a string to lower case.
Syntax
LOWER(text)
text
|
string |
The string to convert to lower case. |
Returns
string |
The string with lower case letters. |
MD5
This function returns the MD5 checksum of a string as a hexadecimal string.
Syntax
MD5(text)
text
|
string |
The string for which to return the MD5 checksum. |
Returns
string |
The hexadecimal string. |
MID
This function returns the specified number of characters from a string, starting from a given position in the string.
Syntax
MID(text, start_num, num_chars)
text
|
string |
The string from which to return the specified characters. |
start_num
|
int |
An integer that indicates the starting position in the string. |
num_chars
|
int |
An integer that specifies the number of characters to return, starting with the character indicated by start_num . |
Returns
string |
The specified number of characters from the string. |
REGEX
This function tests whether a string matches the specified regular expression pattern.
Syntax
REGEX(text, pattern [, flags ])
text
|
string |
The string to test against the pattern . |
pattern
|
string |
The regular expression pattern to look for in the text . For information about the supported regular expression syntax, see the Regular Expression Syntax section of the W3C XQuery 1.0 and XPath 2.0 Functions and Operators specification. |
flags
|
string |
You can include one or more optional modifier flags that further define the pattern. For information about flags, see the Flags section of the W3C Functions and Operators specification. |
Returns
boolean |
True if the string matches the regular expression pattern and false if it does not. |
REGEXP_SUBSTR
This function searches a string for the specified regular expression pattern and returns the substring that matches the pattern.
Syntax
REGEXP_SUBSTR(text, pattern [, start_position ] [, nth_appearance ])
text
|
string |
The string to test against the pattern . |
pattern
|
string |
The regular expression pattern to look for in the text . For information about the supported regular expression syntax, see the Regular Expression Syntax section of the W3C XQuery 1.0 and XPath 2.0 Functions and Operators specification. |
start_position
|
int |
An optional integer that specifies the number of characters from the beginning of the string to start searching for matches (the default value is 1). |
nth_appearance
|
int |
An optional integer that specifies which occurrence of the pattern to match (the default value is 1). |
Returns
string |
The substring that matches the regular expression pattern. |
REPLACE
This function extends the REGEX function to provide the ability to find a pattern in a string and replace it with another pattern. The function returns the replaced string.
Syntax
REPLACE(text, pattern, replacement_pattern [, flags ])
text
|
string |
The string to test against the pattern . |
pattern
|
string |
The regular expression pattern to look for in the text . For information about the supported regular expression syntax, see the Regular Expression Syntax section of the W3C XQuery 1.0 and XPath 2.0 Functions and Operators specification. |
replacement_pattern
|
string |
The pattern to replace the pattern with. |
flags
|
string |
You can include one or more optional modifier flags that further define the pattern. For information about flags, see the Flags section of the W3C Functions and Operators specification. |
Returns
string |
The string that contains the replacement pattern. |
RIGHT
This function returns the specified number of characters starting from the end (right side) of the string.
Syntax
RIGHT(text, num_chars)
text
|
string |
The string from which to return the specified number of characters. |
num_chars
|
int |
An integer that specifies the number of characters to return, starting from the right side of the text . |
Returns
string |
The specified characters from the string. |
SEARCH
This function uses text search semantics to evaluate whether the specified string matches the given pattern.
Syntax
SEARCH(text, pattern [, required ] [, wildcard ] [, escape ])
text
|
string |
The string to search. |
pattern
|
string |
The search string to look for in the text . Anzo automatically converts the value to a regular expression pattern that uses text search semantics. |
required
|
boolean |
An optional boolean value that indicates whether the text must include all elements of the search pattern to qualify as a match or whether matching just part of the pattern qualifies as a match. |
wildcard
|
boolean |
An optional boolean value that indicates whether or not to add the wildcard character * to the end of the search pattern. |
escape
|
boolean |
An optional boolean value that indicates whether or not escape all of the special characters (such as + , - , or | ) in the text . |
Returns
boolean |
True if strings match the pattern and false if they do not. |
SHA1
This function calculates the SHA-1 digest of a string.
Syntax
SHA1(text)
text
|
string |
The string for which to calculate the SHA-1 digest. |
Returns
SHA224
This function calculates the SHA-224 digest of a string.
Syntax
SHA224(text)
text
|
string |
The string for which to calculate the SHA-224 digest. |
Returns
string |
The SHA-224 digest. |
SHA256
This function calculates the SHA-256 digest of a string.
Syntax
SHA256(text)
text
|
string |
The string for which to calculate the SHA-256 digest. |
Returns
string |
The SHA-256 digest. |
SHA384
This function calculates the SHA-384 digest of a string.
Syntax
SHA384(text)
text
|
string |
The string for which to calculate the SHA-384 digest. |
Returns
string |
The SHA-384 digest. |
SHA512
This function calculates the SHA-512 digest of a string.
Syntax
SHA512(text)
text
|
string |
The string for which to calculate the SHA-512 digest. |
Returns
string |
The SHA-512 digest. |
STRAFTER
This function returns the portion of a string that comes after the specified substring.
Syntax
STRAFTER(text, substring)
text
|
string |
The string from which to return the characters that follow the substring . |
substring
|
string |
The string to match in the text . The function will return the part of the text that comes after this substring. |
Returns
string |
The part of the string that comes after the substring. |
STRBEFORE
This function returns the portion of a string that comes before the specified substring.
Syntax
STRAFTER(text, substring)
text
|
string |
The string from which to return the characters that precede the substring . |
substring
|
string |
The string to match in the text . The function will return the part of the text that comes before this substring. |
Returns
string |
The part of the string that comes before the substring. |
STRDT
This function constructs a literal value with the specified data type.
Syntax
STRDT(text, datatype)
text
|
string |
The string to add a data type specification to. |
datatype
|
URI |
The data type URI to add to the text . For example, xsd:integer or <http://www.w3.org/2001/XMLSchema#integer> . |
Returns
string |
The typed literal value. |
STRENDS
This function evaluates whether the specified string ends with the specified substring.
Syntax
STRENDS(text, substring)
text
|
string |
The string to search for the substring . |
substring
|
string |
The string to match at the end of text . The function returns true if the text ends in the specified substring and false if it does not. |
Returns
boolean |
True if strings end with the specified substring and false if they do not. |
STRLANG
This function constructs a literal value with the specified language tag.
Syntax
STRLANG(text, language_tag)
text
|
string |
The string to add the language tag to. |
language_tag
|
string |
The language tag to add to the text . |
Returns
string |
The literal value with the language tag. |
STRLEN
This function calculates the length (in characters) of a string value.
Syntax
STRLEN(text)
text
|
string |
The string for which to return the length. |
Returns
long |
The number of characters in the string. |
STRSTARTS
This function evaluates whether the specified string starts with the specified substring.
Syntax
STRENDS(text, substring)
text
|
string |
The string to search for the substring . |
substring
|
string |
The string to match at the beginning of text . The function returns true if the text starts with the specified substring and false if it does not. |
Returns
boolean |
True if strings begin with the specified substring and false if they do not. |
STRUUID
This function returns a string that is the result of generating a Universally Unique Identifier (UUID).
Syntax
STRUUID()
Returns
SUBSTITUTE
This function substitutes the existing text for the specified new text.
Syntax
SUBSTITUTE(text, old_text, new_text [, instance_num ])
text
|
string |
The string to substitute text in. |
old_text
|
string |
The string within the text to replace. |
new_text
|
string |
The string to replace the old_text with. |
instance_num
|
int |
An optional integer that specifies the number of old_text instances to replace. |
Returns
string |
The string with the new text. |
SUBSTR
This function returns a substring from a string value.
Syntax
SUBSTR(text, start [, length ])
text
|
string |
The string to find the substring in. |
start
|
int |
An integer that specifies the number of the character in the text that should be the start of the substring. |
length
|
int |
An optional integer that specifies the total number of characters to include in the substring. If not specified, the substring will end at the end of the text value. |
Returns
TOURI
This function casts a string literal value to a URI.
Syntax
TOURI(text)
text
|
string |
The string literal to cast to a URI. |
Returns
URI |
The literal value as a URI. |
TRIM
This function removes all spaces from a string except for any single spaces between words.
Syntax
TRIM(text)
text
|
string |
The string to trim. |
Returns
string |
The string with spaces removed. |
UCASE
This function converts all letters in a string to upper case.
Syntax
UPPER(text)
text
|
string |
The string value to convert to upper case. |
Returns
string |
The string with upper case characters. |
UPPER
This function converts all letters in a string literal to upper case.
Syntax
UPPER(text)
text
|
string |
The string literal to convert to upper case. |
Returns
string |
The string with upper case characters. |