String Functions

This topic describes the Graph Lakehouse functions that operate on string data types.

  • CONCAT: Concatenates a list of strings.
  • CONTAINS: Evaluates whether the specified string contains the given pattern.
  • ENCODE_FOR_URI: Encodes the specified string as a URI.
  • ESCAPEHTML: Escapes the specified string for use in HTML.
  • FIND: Returns the position—from left to right—of a string within another string.
  • FINDREVERSE: Returns the position—from right to left—of a string within another string.
  • GROUP_CONCAT: Concatenates a group of strings into a single string.
  • LANG: Returns any language tags that are included with strings.
  • LANGMATCHES: Evaluates whether a string includes a language tag that matches the specified language range.
  • LCASE: Converts the letters in a string to lower case.
  • LEFT: Returns the specified number of characters starting from the beginning (left side) of the string.
  • LEVENSHTEIN_DIST: Calculates the Levenshtein distance or measure of similarity between two strings.
  • LTRIM_WS: Trims white space from the left side of a string.
  • REGEX: Evaluates whether a string matches the specified regular expression pattern.
  • REGEXP_SUBSTR: Searches a string for the specified regular expression pattern and returns the substring that matches the pattern.
  • REPLACE: Extends the REGEX function to provide the ability to find a pattern in a string and replace it with another pattern.
  • RIGHT: Returns the specified number of characters starting from the end (right side) of the string.
  • RPAD: Pads the right side of a string with the specified number of spaces.
  • RTRIM_WS: Trims white space from the right side of a string
  • STRAFTER: Returns the portion of a string that comes after the specified substring.
  • STRBEFORE:Returns the portion of a string that comes before the specified substring.
  • STRENDS: Evaluates whether the specified string ends with the specified substring.
  • STRLANG: Constructs a literal value with the specified language tag.
  • STRLEN: Returns the number of characters in the specified string.
  • STRSTARTS: Evaluates whether the specified string starts with the specified substring.
  • STRUUID: Returns a string that is the result of generating a Universally Unique Identifier (UUID).
  • SUBSTR: Returns a substring from a string value.
  • TRIM: Removes all spaces from a string except for any single spaces between words.
  • UCASE: Converts the letters in a string to upper case.
  • URI: Casts a string to a URI.

Typographical Conventions

The following list describes the conventions used to document function syntax:

  • CAPS: Although SPARQL is case-insensitive, SPARQL keywords in this section are written in uppercase for readability.
  • [ argument ]: Brackets indicate an optional argument or keyword.
  • |: Means OR. Indicates that you can use one or more of the specified options.

CONCAT

This function concatenates two or more strings and returns the result as a string.

Syntax

CONCAT(text1, text2 [, textN ])
Argument Type Description
text1–N string The strings that you want to concatenate to form a single string.

Returns

Type Description
string The concatenated string.

CONTAINS

This function evaluates whether the specified strings contain the given pattern.

Syntax

CONTAINS(text, pattern)
Argument Type Description
text string The string value that you want to check against the specified pattern.
pattern string The string pattern that you want to look for in the supplied text.

Returns

Type Description
boolean True if the strings contain the pattern and false if they do not.

ENCODE_FOR_URI

This function encodes the specified string as a URI and returns a string in URI format.

Syntax

ENCODE_FOR_URI(text)
Argument Type Description
text string The string value to encode as a URI.

Returns

Type Description
string The string as a URI.

Example

PREFIX tickit: <http://anzograph.com/tickit/>
SELECT DISTINCT (ENCODE_FOR_URI(?eventname) as ?event)
FROM <http://anzograph.com/tickit>
WHERE {
   ?s tickit:eventid ?eventid .
   ?eventid tickit:eventname ?eventname .
}
ORDER BY ?event
LIMIT 10
 event
-----------------------------------
.38%20Special
3%20Doors%20Down
70s%20Soul%20Jam
A%20Bronx%20Tale
A%20Catered%20Affair
A%20Chorus%20Line
A%20Christmas%20Carol
A%20Doll%27s%20House
A%20Man%20For%20All%20Seasons
A%20Midsummer%20Night%27s%20Dream
10 rows

ESCAPEHTML

This function escapes the specified string for use in HTML.

Syntax

ESCAPEHTML(text)
Argument Type Description
text string The string value to escape for HTML.

Returns

Type Description
string The string escaped for HTML.

FIND

This function returns the position—from left to right—of a string within another string.

You can use FINDREVERSE to find the character or substring position from right to left.

Syntax

FIND(find_text, within_text, start_num)
Argument Type Description
find_text string The string to look for in the within_text.
within_text string The string to search within.
start_num int An integer that indicates the position to start from when looking for the find_text. The starting position is at the beginning of the within_text value and characters are counted from left to right.

Returns

Type Description
int The character position (from left to right) where the substring starts.

FINDREVERSE

Similar to FIND, this function returns the position—from right to left—of a string within another string.

Syntax

FINDREVERSE(find_text, within_text, start_num)
Argument Type Description
find_text string The string to look for in the within_text value.
within_text string The string to search within.
start_num int An integer that indicates the position to start from when looking for the find_text. The starting position is the end of the within_text value and characters are counted from right to left.

Returns

Type Description
int The character position (from right to left) where the substring starts.

GROUP_CONCAT

This function concatenates a group of strings into a single string.

Syntax

GROUP_CONCAT (group ; [ SEPARATOR = "separator_char" ] ; [ ROW_LIMIT = max_rows ] ;
[ PRE = "prefix" ] ; [ VALUE_SERIALIZE = serialize ] ; [ DELIMIT_BLANKS = separate_blanks ] ;
[ MAX_LENGTH = string_length ] ; [ SUFFIX = "suffix" ])
Argument Type Description
group string The group of strings to concatenate.
separator_char string Optional argument that defines the separator to use between the values in returned strings. When SEPARATOR is omitted, Graph Lakehouse separates values with a space.
max_rows int Optional argument that puts a maximum limit on the number of rows to retrieve for the group. When ROW_LIMIT is omitted, the default is unlimited. Note that Graph Lakehouse performs the GROUP_CONCAT for each slice separately and combines the results from each slice. The ROW_LIMIT is applied to each slice, not the total result. Therefore, the total number of values that are concatenated will be larger than the specified limit, proportional to the number of slices in the cluster.
prefix string Optional string to add as a prefix to the resulting string.
serialize boolean Optional argument that indicates whether returned values should be serialized with the value's data type. When VALUE_SERIALIZE is omitted, the default is false.
separate_blanks boolean Optional argument that indicates whether to delimit blanks with the SEPARATOR value. When DELIMIT_BLANKS is omitted, the default is false.
string_length int Optional argument that limits the resulting strings to a maximum character length. Graph Lakehouse has a 2MB (~2,000,000 characters) limit on the length of strings and displays an error if GROUP_CONCAT returns a string that is longer than 2000000. When MAX_LENGTH is omitted, the default is unlimited.
suffix string Optional argument that defines a suffix to add to the resulting strings. When SUFFIX is omitted, Graph Lakehouse adds an empty string as the suffix.

Returns

Type Description
string The concatenated string.

Example

The query below concatenates the list of friends for 10 people in the sample Tickit data set. Since the GROUP_CONCAT expression includes ROW_LIMIT=2, Graph Lakehouse limits the records to two for each slice (or shard) of data.

SELECT ?person (GROUP_CONCAT(?id;SEPARATOR=",";ROW_LIMIT=2) AS ?friends)
FROM <http://anzograph.com/tickit>
WHERE { 
  ?person <http://anzograph.com/tickit/friend> ?friend .
  BIND(STRAFTER(STR(?friend), "http://anzograph.com/tickit/") as ?id)
}
GROUP BY ?person
ORDER BY ?person
LIMIT 10
 person                                 | friends
----------------------------------------+-------------------------------------------------
http://anzograph.com/tickit/person1     | person2894,person20624,person33618,person47127
http://anzograph.com/tickit/person10    | person3136,person22714,person2509,person24535
http://anzograph.com/tickit/person100   | person42775,person29725,person27334,person24553
http://anzograph.com/tickit/person1000  | person19040,person39066,person2236,person9089
http://anzograph.com/tickit/person10000 | person43706,person37085,person18874,person31270
http://anzograph.com/tickit/person10001 | person3389,person44830,person4720,person307
http://anzograph.com/tickit/person10002 | person46462,person43989,person46491,person31130
http://anzograph.com/tickit/person10003 | person31544,person19595,person23460,person28465
http://anzograph.com/tickit/person10004 | person11070,person19845,person11172,person24252
http://anzograph.com/tickit/person10005 | person33888,person9467,person35761,person47709
10 rows

LANG

This function returns any language tags that are included in the string. The results are grouped by each language tag or by "blank" if a value does not have a language tag.

Syntax

LANG(text)
Argument Type Description
text string The string to search for language tags.

Returns

Type Description
string The found language tags.

LANGMATCHES

This function tests whether a string includes a language tag that matches the specified language range.

Syntax

LANGMATCHES(text, language_range)
Argument Type Description
text string The string to evaluate.
language_range string The language tag to match in the text.

Returns

Type Description
boolean True if strings include a language tag that matches the range and false if they do not.

LCASE

This function converts the letters in a string literal to lower case.

To convert the characters in a string according to a specific locale, you can use the LCASE utility extension.

Syntax

LCASE(text)
Argument Type Description
text string The string literal to convert to lower case.

Returns

Type Description
string The string with lower case letters.

LEFT

This function returns the specified number of characters starting from the beginning (left side) of the string.

Syntax

LEFT(text, num_chars)
Argument Type Description
text string The string from which to return the specified number of characters.
num_chars int An integer that specifies the number of characters to return, starting from the left side of the text.

Returns

Type Description
string The specified number of characters from the string.

LEVENSHTEIN_DIST

This function calculates the Levenshtein distance or measure of similarity between two strings. The distance is the smallest number of insertions, deletions, and/or substitutions required to transform the first string into the second string.

Syntax

LEVENSHTEIN_DIST(text1, text2)
Argument Type Description
text1 string The string that would be transformed into text2.
text2 string The string to measure text1 against.

Returns

Type Description
int The Levenshtein distance between the strings.

LTRIM_WS

This function removes all spaces from the left side of a string.

Syntax

LTRIM_WS(text)
Argument Type Description
text string The string to trim.

Returns

Type Description
string The string with spaces removed.

REGEX

This function tests whether a string matches the specified regular expression pattern.

Syntax

REGEX(text, pattern [, flags ])
Argument Type Description
text string The string to test against the pattern.
pattern string The regular expression pattern to look for in the text. For information about the supported regular expression syntax, see the Regular Expression Syntax section of the W3C XQuery 1.0 and XPath 2.0 Functions and Operators specification.
flags string You can include one or more optional modifier flags that further define the pattern. For information about flags, see the Flags section of the W3C Functions and Operators specification.

Returns

Type Description
boolean True if the string matches the regular expression pattern and false if it does not.

REGEXP_SUBSTR

This function searches a string for the specified regular expression pattern and returns the substring that matches the pattern.

Syntax

REGEXP_SUBSTR(text, pattern [, start_position ] [, nth_appearance ])
Argument Type Description
text string The string to test against the pattern.
pattern string The regular expression pattern to look for in the text. For information about the supported regular expression syntax, see the Regular Expression Syntax section of the W3C XQuery 1.0 and XPath 2.0 Functions and Operators specification.
start_position int An optional integer that specifies the number of characters from the beginning of the string to start searching for matches (the default value is 1).
nth_appearance int An optional integer that specifies which occurrence of the pattern to match (the default value is 1).

Returns

Type Description
string The substring that matches the regular expression pattern.

REPLACE

This function extends the REGEX function to provide the ability to find a pattern in a string and replace it with another pattern. The function returns the replaced string.

Syntax

REPLACE(text, pattern, replacement_pattern [, flags ])
Argument Type Description
text string The string to test against the pattern.
pattern string The regular expression pattern to look for in the text. For information about the supported regular expression syntax, see the Regular Expression Syntax section of the W3C XQuery 1.0 and XPath 2.0 Functions and Operators specification.
replacement_pattern string The pattern to replace the pattern with.
flags string You can include one or more optional modifier flags that further define the pattern. For information about flags, see the Flags section of the W3C Functions and Operators specification.

Returns

Type Description
string The string that contains the replacement pattern.

RIGHT

This function returns the specified number of characters starting from the end (right side) of the string.

Syntax

RIGHT(text, num_chars)
Argument Type Description
text string The string from which to return the specified number of characters.
num_chars int An integer that specifies the number of characters to return, starting from the right side of the text.

Returns

Type Description
string The specified characters from the string.

RPAD

This function pads the end (right side) of a string with the number of spaces that you specify.

Syntax

RPAD(text, num_spaces)
Argument Type Description
text string The string to add the spaces to.
num_spaces int An integer that specifies the number of spaces to add to the end of the text.

Returns

Type Description
string The value with the specified number of spaces.

RTRIM_WS

This function removes all spaces from the right side of a string.

Syntax

RTRIM_WS(text)
Argument Type Description
text string The string to trim.

Returns

Type Description
string The string with spaces removed.

STRAFTER

This function returns the portion of a string that comes after the specified substring.

Syntax

STRAFTER(text, substring)
Argument Type Description
text string The string from which to return the characters that follow the substring.
substring string The string to match in the text. The function will return the part of the text that comes after this substring.

Returns

Type Description
string The part of the string that comes after the substring.

Example

The following example query uses STRAFTER to return only the unique portion of each event ID in the sample Tickit data set. The query uses BIND to convert the event URIs to strings and bind them to the ?str_event variable.

SELECT (STRAFTER(?str_event, "event") AS ?event_number) ?name
FROM <http://anzograph.com/tickit>
WHERE {
  ?event <http://anzograph.com/tickit/eventname> ?name .
  BIND (STR(?event) AS ?str_event)
}
ORDER BY ?event_number
event_number | name
-------------+---------------------------------
1            | Gotterdammerung
10           | Rigoletto
100          | Siegfried
1000         | Gypsy
1001         | Chicago
1002         | The King and I
1003         | Pal Joey
1004         | Grease
...
8798 rows

STRBEFORE

This function returns the portion of a string that comes before the specified substring.

Syntax

STRAFTER(text, substring)
Argument Type Description
text string The string from which to return the characters that precede the substring.
substring string The string to match in the text. The function will return the part of the text that comes before this substring.

Returns

Type Description
string The part of the string that comes before the substring.

STRENDS

This function evaluates whether the specified string ends with the specified substring.

Syntax

STRENDS(text, substring)
Argument Type Description
text string The string to search for the substring.
substring string The string to match at the end of text. The function returns true if the text ends in the specified substring and false if it does not.

Returns

Type Description
boolean True if strings end with the specified substring and false if they do not.

STRLANG

This function constructs a literal value with the specified language tag.

Syntax

STRLANG(text, language_tag)
Argument Type Description
text string The string to add the language tag to.
language_tag string The language tag to add to the text.

Returns

Type Description
string The literal value with the language tag.

STRLEN

This function calculates the length (in characters) of a string value.

Syntax

STRLEN(text)
Argument Type Description
text string The string for which to return the length.

Returns

Type Description
long The number of characters in the string.

STRSTARTS

This function evaluates whether the specified string starts with the specified substring.

Syntax

STRENDS(text, substring)
Argument Type Description
text string The string to search for the substring.
substring string The string to match at the beginning of text. The function returns true if the text starts with the specified substring and false if it does not.

Returns

Type Description
boolean True if strings begin with the specified substring and false if they do not.

STRUUID

This function returns a string that is the result of generating a Universally Unique Identifier (UUID).

Syntax

STRUUID()

Returns

Type Description
string The UUID.

SUBSTR

This function returns a substring from a string value.

Syntax

SUBSTR(text, start [, length ])
Argument Type Description
text string The string to find the substring in.
start int An integer that specifies the number of the character in the text that should be the start of the substring.
length int An optional integer that specifies the total number of characters to include in the substring. If not specified, the substring will end at the end of the text value.

Returns

Type Description
string The substring.

TRIM

This function removes all spaces from a string except for any single spaces between words.

Syntax

TRIM(text)
Argument Type Description
text string The string to trim.

Returns

Type Description
string The string with spaces removed.

UCASE

This function converts all letters in a string to upper case.

To convert the characters in a string according to a specific locale, you can use the UCASE utility extension.

Syntax

UPPER(text)
Argument Type Description
text string The string value to convert to upper case.

Returns

Type Description
string The string with upper case characters.

URI

This function casts the specified string to a URI.

Syntax

URI(value)
Argument Type Description
value string The value to convert to a URI.

Returns

Type Description
URI The value as a URI.