public class CharSuffixArray extends Object
CharSuffixArray implements a suffix array of characters.
Given a string characters cs, the corresponding
suffix array is an array of int values of length equal to
cs.length(). The suffix array contains each integer between 0
(inclusive) and the length of cs (exclusive). The suffix
array is sorted so that an index m appears before n
only if the string running from index m to the end of
cs (i.e., cs.substring(m,cs.length-1) is less
than the string running from index n to the end of cs,
using ordinary Java String comparison.
"abracadabra".
Here's the string itself, with its corresponding indexes:
The suffixes and their starting indexes areabracadabra 012345678901 0 1
The suffix array sorts the char array indexes based on the sort order of the corresponding suffixes as strings.
Char Array Index Suffix 0 abracadabra 1 bracadabra 2 racadabra 3 acadabra 4 cadabra 5 adabra 6 dabra 7 abra 8 bra 9 ra 10 a
Thus the suffix array itself for
Suffix Index Value Suffix 0 10 a 1 7 abra 2 0 abracadabra 3 3 acadabra 4 5 adabra 5 8 bra 6 1 bracadabra 7 4 cadabra 8 6 dabra 9 9 ra 10 2 racadabra
"abracadabra"
is the int[]-type array
suffixArray("abracadbra")
= { 10, 7, 0, 3, 5, 8, 1, 4, 6, 9, 2 }
"abracadabra", has two
instances of the substring "br", corresponding to the
suffixes "bracadabra" starting at index 1 in the original
string and "bra" starting at index 8 in the original
string. Note that these two suffixes are adjacent in the suffix
array, occupying indexes 5 and 6 (in reverse order, because suffix
"bra" sorts before "bracadabra" as a string.
The method prefixMatches(int) will return all spans in
the suffix array that match up to a specific number of characters.
For instance, to find all substrings that match of length 3 from
suffix array sa, the method call sa.prefixMatches(3) returns a list containing all spans as integer
arrays of type int[] with spans being represented from
start position (inclusive) to end position (exclusive), which would
here contain elements {1,3}, {5,7} indicating first
that the suffixes at positions 1 and 2, namely "abra" and
"abracadabra" start with the same three characters and
second that the suffixes at positions 5 and 6, "bra" and
"bracadabra", start with the same three characters. Thus
we found all substrings of length 3 that occur more than once,
namely "abr" and "bra", along with their positions.
By using the suffix array itself, the positions in the underlying
string may be retrieved. For instance, the suffixes at positions
1 and 2 in the suffix array start at positions
| Modifier and Type | Field and Description |
|---|---|
static char |
SEPARATOR
A special separator character, used to mark the
boundaries of documents within the character array.
|
| Constructor and Description |
|---|
CharSuffixArray(String text)
Construct a suffix array from the specified string, with no
bound on suffix length.
|
CharSuffixArray(String text,
int maxSuffixLength)
Construct a suffix array from the specified string, bounding
comparisons for sorting by the specified maximum suffix length.
|
| Modifier and Type | Method and Description |
|---|---|
int |
maxSuffixLength()
Returns the maximum suffix length for this character suffix
array.
|
List<int[]> |
prefixMatches(int minMatchLength)
Returns a list of maximal spans of suffix array indexes which
refer to suffixes that share a prefix of at least the specified
minimum match length.
|
String |
suffix(int csIndex,
int maxLength)
Returns the string that starts at position
i in
the character index and runs to the end of the character array
or up to the specified maximum length. |
int |
suffixArray(int idx)
Return the value of the suffix array at the specified index.
|
int |
suffixArrayLength()
Return the number of entries in this suffix array.
|
String |
text()
Returns the underlying array of characters for this class.
|
public static char SEPARATOR
''.public CharSuffixArray(String text)
text - Underlying characters making up suffix array.public CharSuffixArray(String text, int maxSuffixLength)
This constructor is appropriate if no operations will be subsequently performed on suffixes greater than the maximum specified length.
text - Underlying text for suffix array.maxSuffixLength - Maximum suffix length for comparison.public String text()
public int maxSuffixLength()
public int suffixArray(int idx)
idx - Index into suffix array.public int suffixArrayLength()
public String suffix(int csIndex, int maxLength)
i in
the character index and runs to the end of the character array
or up to the specified maximum length.csIndex - Starting index in underlying array of characters.maxLength - Maximum length of returned string.public List<int[]> prefixMatches(int minMatchLength)
minMatchLength - Minimum number of characters required to
match.Copyright © 2016 Alias-i, Inc.. All rights reserved.