public class Levenshtein extends Object
StringUtils.getLevenshteinDistance(CharSequence, CharSequence)
to calculate the edit distance between two strings. Provides useful helper
methods to traverse a set of strings and select the most similar ones
to a given input string.| Constructor and Description |
|---|
Levenshtein() |
| Modifier and Type | Method and Description |
|---|---|
static <T extends CharSequence> |
findMinimum(Collection<T> ss,
CharSequence t)
Searches the given collection of strings and returns the string that
has the lowest Levenshtein distance to a given second string
t. |
static <T extends CharSequence> |
findMinimum(Collection<T> ss,
CharSequence t,
int n,
int threshold)
Searches the given collection of strings and returns a collection of at
most
n strings that have the lowest Levenshtein distance
to a given string t. |
static <T extends CharSequence> |
findSimilar(Collection<T> ss,
CharSequence t)
Searches the given collection of strings and returns a collection of
strings similar to a given string
t. |
public static <T extends CharSequence> T findMinimum(Collection<T> ss, CharSequence t)
t.
If the collection contains multiple strings with the same distance to
t only the first one will be returned.T - the type of the strings in the given collectionss - the collection to searcht - the second stringpublic static <T extends CharSequence> Collection<T> findMinimum(Collection<T> ss, CharSequence t, int n, int threshold)
n strings that have the lowest Levenshtein distance
to a given string t. The returned collection will be
sorted according to the distance with the string with the lowest
distance at the first position.T - the type of the strings in the given collectionss - the collection to searcht - the string to compare ton - the maximum number of strings to returnthreshold - a threshold for individual item distances. Only items
with a distance below this threshold will be included in the result.public static <T extends CharSequence> Collection<T> findSimilar(Collection<T> ss, CharSequence t)
t. Uses reasonable default
values for human-readable strings. The returned collection will be
sorted according to their similarity with the string with the best
match at the first position.T - the type of the strings in the given collectionss - the collection to searcht - the string to compare to