String Functions

This topic describes the Graph Lakehouse functions that operate on string data types.

CONCAT: Concatenates a list of strings.
CONTAINS: Evaluates whether the specified string contains the given pattern.
ENCODE_FOR_URI: Encodes the specified string as a URI.
ESCAPEHTML: Escapes the specified string for use in HTML.
FIND: Returns the position—from left to right—of a string within another string.
FINDREVERSE: Returns the position—from right to left—of a string within another string.
GROUP_CONCAT: Concatenates a group of strings into a single string.
LANG: Returns any language tags that are included with strings.
LANGMATCHES: Evaluates whether a string includes a language tag that matches the specified language range.
LCASE: Converts the letters in a string to lower case.
LEFT: Returns the specified number of characters starting from the beginning (left side) of the string.
LEVENSHTEIN_DIST: Calculates the Levenshtein distance or measure of similarity between two strings.
LTRIM_WS: Trims white space from the left side of a string.
REGEX: Evaluates whether a string matches the specified regular expression pattern.
REGEXP_SUBSTR: Searches a string for the specified regular expression pattern and returns the substring that matches the pattern.
REPLACE: Extends the REGEX function to provide the ability to find a pattern in a string and replace it with another pattern.
RIGHT: Returns the specified number of characters starting from the end (right side) of the string.
RPAD: Pads the right side of a string with the specified number of spaces.
RTRIM_WS: Trims white space from the right side of a string
STRAFTER: Returns the portion of a string that comes after the specified substring.
STRBEFORE:Returns the portion of a string that comes before the specified substring.
STRENDS: Evaluates whether the specified string ends with the specified substring.
STRLANG: Constructs a literal value with the specified language tag.
STRLEN: Returns the number of characters in the specified string.
STRSTARTS: Evaluates whether the specified string starts with the specified substring.
STRUUID: Returns a string that is the result of generating a Universally Unique Identifier (UUID).
SUBSTR: Returns a substring from a string value.
TRIM: Removes all spaces from a string except for any single spaces between words.
UCASE: Converts the letters in a string to upper case.
URI: Casts a string to a URI.

Typographical Conventions

The following list describes the conventions used to document function syntax:

CAPS: Although SPARQL is case-insensitive, SPARQL keywords in this section are written in uppercase for readability.
[ argument ]: Brackets indicate an optional argument or keyword.
|: Means OR. Indicates that you can use one or more of the specified options.

CONCAT

This function concatenates two or more strings and returns the result as a string.

Syntax

CONCAT(text1, text2 [, textN ])

Argument	Type	Description
text1–N	string	The strings that you want to concatenate to form a single string.

Returns

Type	Description
string	The concatenated string.

CONTAINS

This function evaluates whether the specified strings contain the given pattern.

Syntax

CONTAINS(text, pattern)

Argument	Type	Description
text	string	The string value that you want to check against the specified pattern.
pattern	string	The string pattern that you want to look for in the supplied text.

Returns

Type	Description
boolean	`True` if the strings contain the pattern and `false` if they do not.

ENCODE_FOR_URI

This function encodes the specified string as a URI and returns a string in URI format.

Syntax

ENCODE_FOR_URI(text)

Argument	Type	Description
text	string	The string value to encode as a URI.

Returns

Type	Description
string	The string as a URI.

Example

PREFIX tickit: <http://anzograph.com/tickit/>
SELECT DISTINCT (ENCODE_FOR_URI(?eventname) as ?event)
FROM <http://anzograph.com/tickit>
WHERE {
   ?s tickit:eventid ?eventid .
   ?eventid tickit:eventname ?eventname .
}
ORDER BY ?event
LIMIT 10

 event
-----------------------------------
.38%20Special
3%20Doors%20Down
70s%20Soul%20Jam
A%20Bronx%20Tale
A%20Catered%20Affair
A%20Chorus%20Line
A%20Christmas%20Carol
A%20Doll%27s%20House
A%20Man%20For%20All%20Seasons
A%20Midsummer%20Night%27s%20Dream
10 rows

ESCAPEHTML

This function escapes the specified string for use in HTML.

Syntax

ESCAPEHTML(text)

Argument	Type	Description
text	string	The string value to escape for HTML.

Returns

Type	Description
string	The string escaped for HTML.

FIND

This function returns the position—from left to right—of a string within another string.

You can use FINDREVERSE to find the character or substring position from right to left.

Syntax

FIND(find_text, within_text, start_num)

Argument	Type	Description
find_text	string	The string to look for in the `within_text`.
within_text	string	The string to search within.
start_num	int	An integer that indicates the position to start from when looking for the `find_text`. The starting position is at the beginning of the `within_text` value and characters are counted from left to right.

Returns

Type	Description
int	The character position (from left to right) where the substring starts.

FINDREVERSE

Similar to FIND, this function returns the position—from right to left—of a string within another string.

Syntax

FINDREVERSE(find_text, within_text, start_num)

Argument	Type	Description
find_text	string	The string to look for in the `within_text` value.
within_text	string	The string to search within.
start_num	int	An integer that indicates the position to start from when looking for the `find_text`. The starting position is the end of the `within_text` value and characters are counted from right to left.

Returns

Type	Description
int	The character position (from right to left) where the substring starts.

GROUP_CONCAT

This function concatenates a group of strings into a single string.

Syntax

GROUP_CONCAT (group ; [ SEPARATOR = "separator_char" ] ; [ ROW_LIMIT = max_rows ] ;
[ PRE = "prefix" ] ; [ VALUE_SERIALIZE = serialize ] ; [ DELIMIT_BLANKS = separate_blanks ] ;
[ MAX_LENGTH = string_length ] ; [ SUFFIX = "suffix" ])

Argument	Type	Description
group	string	The group of strings to concatenate.
separator_char	string	Optional argument that defines the separator to use between the values in returned strings. When SEPARATOR is omitted, Graph Lakehouse separates values with a space.
max_rows	int	Optional argument that puts a maximum limit on the number of rows to retrieve for the group. When ROW_LIMIT is omitted, the default is `unlimited`. Note that Graph Lakehouse performs the GROUP_CONCAT for each slice separately and combines the results from each slice. The ROW_LIMIT is applied to each slice, not the total result. Therefore, the total number of values that are concatenated will be larger than the specified limit, proportional to the number of slices in the cluster.
prefix	string	Optional string to add as a prefix to the resulting string.
serialize	boolean	Optional argument that indicates whether returned values should be serialized with the value's data type. When VALUE_SERIALIZE is omitted, the default is `false`.
separate_blanks	boolean	Optional argument that indicates whether to delimit blanks with the SEPARATOR value. When DELIMIT_BLANKS is omitted, the default is `false`.
string_length	int	Optional argument that limits the resulting strings to a maximum character length. Graph Lakehouse has a 2MB (~2,000,000 characters) limit on the length of strings and displays an error if GROUP_CONCAT returns a string that is longer than 2000000. When MAX_LENGTH is omitted, the default is `unlimited`.
suffix	string	Optional argument that defines a suffix to add to the resulting strings. When SUFFIX is omitted, Graph Lakehouse adds an empty string as the suffix.

Returns

Type	Description
string	The concatenated string.

Example

The query below concatenates the list of friends for 10 people in the sample Tickit data set. Since the GROUP_CONCAT expression includes ROW_LIMIT=2, Graph Lakehouse limits the records to two for each slice (or shard) of data.

SELECT ?person (GROUP_CONCAT(?id;SEPARATOR=",";ROW_LIMIT=2) AS ?friends)
FROM <http://anzograph.com/tickit>
WHERE { 
  ?person <http://anzograph.com/tickit/friend> ?friend .
  BIND(STRAFTER(STR(?friend), "http://anzograph.com/tickit/") as ?id)
}
GROUP BY ?person
ORDER BY ?person
LIMIT 10

 person                                 | friends
----------------------------------------+-------------------------------------------------
http://anzograph.com/tickit/person1     | person2894,person20624,person33618,person47127
http://anzograph.com/tickit/person10    | person3136,person22714,person2509,person24535
http://anzograph.com/tickit/person100   | person42775,person29725,person27334,person24553
http://anzograph.com/tickit/person1000  | person19040,person39066,person2236,person9089
http://anzograph.com/tickit/person10000 | person43706,person37085,person18874,person31270
http://anzograph.com/tickit/person10001 | person3389,person44830,person4720,person307
http://anzograph.com/tickit/person10002 | person46462,person43989,person46491,person31130
http://anzograph.com/tickit/person10003 | person31544,person19595,person23460,person28465
http://anzograph.com/tickit/person10004 | person11070,person19845,person11172,person24252
http://anzograph.com/tickit/person10005 | person33888,person9467,person35761,person47709
10 rows

LANG

This function returns any language tags that are included in the string. The results are grouped by each language tag or by "blank" if a value does not have a language tag.

Syntax

LANG(text)

Argument	Type	Description
text	string	The string to search for language tags.

Returns

Type	Description
string	The found language tags.

LANGMATCHES

This function tests whether a string includes a language tag that matches the specified language range.

Syntax

LANGMATCHES(text, language_range)

Argument	Type	Description
text	string	The string to evaluate.
language_range	string	The language tag to match in the `text`.

Returns

Type	Description
boolean	`True` if strings include a language tag that matches the range and `false` if they do not.

LCASE

This function converts the letters in a string literal to lower case.

To convert the characters in a string according to a specific locale, you can use the LCASE utility extension.

Syntax

LCASE(text)

Argument	Type	Description
text	string	The string literal to convert to lower case.

Returns

Type	Description
string	The string with lower case letters.

LEFT

This function returns the specified number of characters starting from the beginning (left side) of the string.

Syntax

LEFT(text, num_chars)

Argument	Type	Description
text	string	The string from which to return the specified number of characters.
num_chars	int	An integer that specifies the number of characters to return, starting from the left side of the `text`.

Returns

Type	Description
string	The specified number of characters from the string.

LEVENSHTEIN_DIST

This function calculates the Levenshtein distance or measure of similarity between two strings. The distance is the smallest number of insertions, deletions, and/or substitutions required to transform the first string into the second string.

Syntax

LEVENSHTEIN_DIST(text1, text2)

Argument	Type	Description
text1	string	The string that would be transformed into `text2`.
text2	string	The string to measure `text1` against.

Returns

Type	Description
int	The Levenshtein distance between the strings.

LTRIM_WS

This function removes all spaces from the left side of a string.

Syntax

LTRIM_WS(text)

Argument	Type	Description
text	string	The string to trim.

Returns

Type	Description
string	The string with spaces removed.

REGEX

This function tests whether a string matches the specified regular expression pattern.

Syntax

REGEX(text, pattern [, flags ])

Argument	Type	Description
text	string	The string to test against the `pattern`.
pattern	string	The regular expression pattern to look for in the `text`. For information about the supported regular expression syntax, see the Regular Expression Syntax section of the W3C XQuery 1.0 and XPath 2.0 Functions and Operators specification.
flags	string	You can include one or more optional modifier flags that further define the pattern. For information about flags, see the Flags section of the W3C Functions and Operators specification.

Returns

Type	Description
boolean	`True` if the string matches the regular expression pattern and `false` if it does not.

REGEXP_SUBSTR

This function searches a string for the specified regular expression pattern and returns the substring that matches the pattern.

Syntax

REGEXP_SUBSTR(text, pattern [, start_position ] [, nth_appearance ])

Argument	Type	Description
text	string	The string to test against the `pattern`.
pattern	string	The regular expression pattern to look for in the `text`. For information about the supported regular expression syntax, see the Regular Expression Syntax section of the W3C XQuery 1.0 and XPath 2.0 Functions and Operators specification.
start_position	int	An optional integer that specifies the number of characters from the beginning of the string to start searching for matches (the default value is 1).
nth_appearance	int	An optional integer that specifies which occurrence of the pattern to match (the default value is 1).

Returns

Type	Description
string	The substring that matches the regular expression pattern.

REPLACE

This function extends the REGEX function to provide the ability to find a pattern in a string and replace it with another pattern. The function returns the replaced string.

Syntax

REPLACE(text, pattern, replacement_pattern [, flags ])

Argument	Type	Description
text	string	The string to test against the `pattern`.
pattern	string	The regular expression pattern to look for in the `text`. For information about the supported regular expression syntax, see the Regular Expression Syntax section of the W3C XQuery 1.0 and XPath 2.0 Functions and Operators specification.
replacement_pattern	string	The pattern to replace the `pattern` with.
flags	string	You can include one or more optional modifier flags that further define the pattern. For information about flags, see the Flags section of the W3C Functions and Operators specification.

Returns

Type	Description
string	The string that contains the replacement pattern.

RIGHT

This function returns the specified number of characters starting from the end (right side) of the string.

Syntax

RIGHT(text, num_chars)

Argument	Type	Description
text	string	The string from which to return the specified number of characters.
num_chars	int	An integer that specifies the number of characters to return, starting from the right side of the `text`.

Returns

Type	Description
string	The specified characters from the string.

RPAD

This function pads the end (right side) of a string with the number of spaces that you specify.

Syntax

RPAD(text, num_spaces)

Argument	Type	Description
text	string	The string to add the spaces to.
num_spaces	int	An integer that specifies the number of spaces to add to the end of the `text`.

Returns

Type	Description
string	The value with the specified number of spaces.

RTRIM_WS

This function removes all spaces from the right side of a string.

Syntax

RTRIM_WS(text)

Argument	Type	Description
text	string	The string to trim.

Returns

Type	Description
string	The string with spaces removed.

STRAFTER

This function returns the portion of a string that comes after the specified substring.

Syntax

STRAFTER(text, substring)

Argument	Type	Description
text	string	The string from which to return the characters that follow the `substring`.
substring	string	The string to match in the `text`. The function will return the part of the text that comes after this substring.

Returns

Type	Description
string	The part of the string that comes after the substring.

Example

The following example query uses STRAFTER to return only the unique portion of each event ID in the sample Tickit data set. The query uses BIND to convert the event URIs to strings and bind them to the ?str_event variable.

SELECT (STRAFTER(?str_event, "event") AS ?event_number) ?name
FROM <http://anzograph.com/tickit>
WHERE {
  ?event <http://anzograph.com/tickit/eventname> ?name .
  BIND (STR(?event) AS ?str_event)
}
ORDER BY ?event_number

event_number | name
-------------+---------------------------------
1            | Gotterdammerung
10           | Rigoletto
100          | Siegfried
1000         | Gypsy
1001         | Chicago
1002         | The King and I
1003         | Pal Joey
1004         | Grease
...
8798 rows

STRBEFORE

This function returns the portion of a string that comes before the specified substring.

Syntax

STRAFTER(text, substring)

Argument	Type	Description
text	string	The string from which to return the characters that precede the `substring`.
substring	string	The string to match in the `text`. The function will return the part of the text that comes before this substring.

Returns

Type	Description
string	The part of the string that comes before the substring.

STRENDS

This function evaluates whether the specified string ends with the specified substring.

Syntax

STRENDS(text, substring)

Argument	Type	Description
text	string	The string to search for the `substring`.
substring	string	The string to match at the end of `text`. The function returns true if the text ends in the specified substring and false if it does not.

Returns

Type	Description
boolean	`True` if strings end with the specified substring and `false` if they do not.

STRLANG

This function constructs a literal value with the specified language tag.

Syntax

STRLANG(text, language_tag)

Argument	Type	Description
text	string	The string to add the language tag to.
language_tag	string	The language tag to add to the `text`.

Returns

Type	Description
string	The literal value with the language tag.

STRLEN

This function calculates the length (in characters) of a string value.

Syntax

STRLEN(text)

Argument	Type	Description
text	string	The string for which to return the length.

Returns

Type	Description
long	The number of characters in the string.

STRSTARTS

This function evaluates whether the specified string starts with the specified substring.

Syntax

STRENDS(text, substring)

Argument	Type	Description
text	string	The string to search for the `substring`.
substring	string	The string to match at the beginning of `text`. The function returns true if the text starts with the specified substring and false if it does not.

Returns

Type	Description
boolean	`True` if strings begin with the specified substring and `false` if they do not.

STRUUID

This function returns a string that is the result of generating a Universally Unique Identifier (UUID).

Syntax

STRUUID()

Returns

Type	Description
string	The UUID.

SUBSTR

This function returns a substring from a string value.

Syntax

SUBSTR(text, start [, length ])

Argument	Type	Description
text	string	The string to find the substring in.
start	int	An integer that specifies the number of the character in the `text` that should be the start of the substring.
length	int	An optional integer that specifies the total number of characters to include in the substring. If not specified, the substring will end at the end of the `text` value.

Returns

Type	Description
string	The substring.

TRIM

This function removes all spaces from a string except for any single spaces between words.

Syntax

TRIM(text)

Argument	Type	Description
text	string	The string to trim.

Returns

Type	Description
string	The string with spaces removed.

UCASE

This function converts all letters in a string to upper case.

To convert the characters in a string according to a specific locale, you can use the UCASE utility extension.

Syntax

UPPER(text)

Argument	Type	Description
text	string	The string value to convert to upper case.

Returns

Type	Description
string	The string with upper case characters.

URI

This function casts the specified string to a URI.

Syntax

URI(value)

Argument	Type	Description
value	string	The value to convert to a URI.

Returns

Type	Description
URI	The value as a URI.