7.9. String Functions and Operators#
String Operators#
The || operator performs concatenation.
The LIKE statement can be used for pattern matching and is documented in
Pattern Comparison: LIKE.
String Functions#
Note
These functions assume that the input strings contain valid UTF-8 encoded
Unicode code points. There are no explicit checks for valid UTF-8 and
the functions may return incorrect results on invalid UTF-8.
Invalid UTF-8 data can be corrected with from_utf8().
Additionally, the functions operate on Unicode code points and not user visible characters (or grapheme clusters). Some languages combine multiple code points into a single user-perceived character, the basic unit of a writing system for a language, but the functions will treat each code point as a separate unit.
The lower() and upper() functions do not perform
locale-sensitive, context-sensitive, or one-to-many mappings required for
some languages. Specifically, this will return incorrect results for
Lithuanian, Turkish and Azeri.
-
chr(n) → varchar# Returns the Unicode code point
nas a single character string.
-
codepoint(string) → integer# Returns the Unicode code point of the only character of
string.
-
concat(string1, ..., stringN) → varchar# Returns the concatenation of
string1,string2,...,stringN. This function provides the same functionality as the SQL-standard concatenation operator (||).
-
hamming_distance(string1, string2) → bigint# Returns the Hamming distance of
string1andstring2, i.e. the number of positions at which the corresponding characters are different. Note that the two strings must have the same length.
-
length(string) → bigint# Returns the length of
stringin characters.
-
levenshtein_distance(string1, string2) → bigint# Returns the Levenshtein edit distance of
string1andstring2, i.e. the minimum number of single-character edits (insertions, deletions or substitutions) needed to changestring1intostring2.
-
lower(string) → varchar# Converts
stringto lowercase.
-
lpad(string, size, padstring) → varchar# Left pads
stringtosizecharacters withpadstring. Ifsizeis less than the length ofstring, the result is truncated tosizecharacters.sizemust not be negative andpadstringmust be non-empty.
-
ltrim(string) → varchar# Removes leading whitespace from
string.
-
replace(string, search) → varchar# Removes all instances of
searchfromstring.
-
replace(string, search, replace) → varchar Replaces all instances of
searchwithreplaceinstring.
-
reverse(string) → varchar# Returns
stringwith the characters in reverse order.
-
rpad(string, size, padstring) → varchar# Right pads
stringtosizecharacters withpadstring. Ifsizeis less than the length ofstring, the result is truncated tosizecharacters.sizemust not be negative andpadstringmust be non-empty.
-
rtrim(string) → varchar# Removes trailing whitespace from
string.
-
split(string, delimiter) -> array(varchar)# Splits
stringondelimiterand returns an array.
-
split(string, delimiter, limit) -> array(varchar) Splits
stringondelimiterand returns an array of size at mostlimit. The last element in the array always contain everything left in thestring.limitmust be a positive number.
-
split_part(string, delimiter, index) → varchar# Splits
stringondelimiterand returns the fieldindex. Field indexes start with1. If the index is larger than than the number of fields, then null is returned.
-
split_to_map(string, entryDelimiter, keyValueDelimiter) → map<varchar, varchar># Splits
stringbyentryDelimiterandkeyValueDelimiterand returns a map.entryDelimitersplitsstringinto key-value pairs.keyValueDelimitersplits each pair into key and value.
-
split_to_multimap(string, entryDelimiter, keyValueDelimiter) -> map(varchar, array(varchar))# Splits
stringbyentryDelimiterandkeyValueDelimiterand returns a map containing an array of values for each unique key.entryDelimitersplitsstringinto key-value pairs.keyValueDelimitersplits each pair into key and value. The values for each key will be in the same order as they appeared instring.
-
strpos(string, substring) → bigint# Returns the starting position of the first instance of
substringinstring. Positions start with1. If not found,0is returned.
-
strpos(string, substring, instance) → bigint Returns the position of the N-th
instanceofsubstringinstring. Wheninstanceis a negative number the search will start from the end ofstring. Positions start with1. If not found,0is returned.
-
position(substring IN string) → bigint# Returns the starting position of the first instance of
substringinstring. Positions start with1. If not found,0is returned.
-
starts_with(string, substring) → boolean# Tests whether
substringis a prefix ofstring.
-
substr(string, start) → varchar# Returns the rest of
stringfrom the starting positionstart. Positions start with1. A negative starting position is interpreted as being relative to the end of the string.
-
substr(string, start, length) → varchar Returns a substring from
stringof lengthlengthfrom the starting positionstart. Positions start with1. A negative starting position is interpreted as being relative to the end of the string.
-
trim(string) → varchar# Removes leading and trailing whitespace from
string.
-
upper(string) → varchar# Converts
stringto uppercase.
-
word_stem(word) → varchar# Returns the stem of
wordin the English language.
-
word_stem(word, lang) → varchar Returns the stem of
wordin thelanglanguage.
Unicode Functions#
-
normalize(string) → varchar# Transforms
stringwith NFC normalization form.
-
normalize(string, form) → varchar Transforms
stringwith the specified normalization form.formmust be be one of the following keywords:Form Description NFDCanonical Decomposition NFCCanonical Decomposition, followed by Canonical Composition NFKDCompatibility Decomposition NFKCCompatibility Decomposition, followed by Canonical Composition Note
This SQL-standard function has special syntax and requires specifying
formas a keyword, not as a string.
-
to_utf8(string) → varbinary# Encodes
stringinto a UTF-8 varbinary representation.
-
from_utf8(binary) → varchar# Decodes a UTF-8 encoded string from
binary. Invalid UTF-8 sequences are replaced with the Unicode replacement characterU+FFFD.
-
from_utf8(binary, replace) → varchar Decodes a UTF-8 encoded string from
binary. Invalid UTF-8 sequences are replaced withreplace. The replacement stringreplacemust either be a single character or empty (in which case invalid characters are removed).