Class Levenshtein


  • public class Levenshtein
    extends Object
    Uses LevenshteinDistance.apply(CharSequence, CharSequence) to calculate the edit distance between two strings. Provides useful helper methods to traverse a set of strings and select the most similar ones to a given input string.
    Author:
    Michel Kraemer
    • Constructor Detail

      • Levenshtein

        public Levenshtein()
    • Method Detail

      • findMinimum

        public static <T extends CharSequence> T findMinimum​(Collection<T> ss,
                                                             CharSequence t)
        Searches the given collection of strings and returns the string that has the lowest Levenshtein distance to a given second string t. If the collection contains multiple strings with the same distance to t only the first one will be returned.
        Type Parameters:
        T - the type of the strings in the given collection
        Parameters:
        ss - the collection to search
        t - the second string
        Returns:
        the string with the lowest Levenshtein distance
      • findMinimum

        public static <T extends CharSequenceCollection<T> findMinimum​(Collection<T> ss,
                                                                         CharSequence t,
                                                                         int n,
                                                                         int threshold)
        Searches the given collection of strings and returns a collection of at most n strings that have the lowest Levenshtein distance to a given string t. The returned collection will be sorted according to the distance with the string with the lowest distance at the first position.
        Type Parameters:
        T - the type of the strings in the given collection
        Parameters:
        ss - the collection to search
        t - the string to compare to
        n - the maximum number of strings to return
        threshold - a threshold for individual item distances. Only items with a distance below this threshold will be included in the result.
        Returns:
        the strings with the lowest Levenshtein distance
      • findSimilar

        public static <T extends CharSequenceCollection<T> findSimilar​(Collection<T> ss,
                                                                         CharSequence t)
        Searches the given collection of strings and returns a collection of strings similar to a given string t. Uses reasonable default values for human-readable strings. The returned collection will be sorted according to their similarity with the string with the best match at the first position.
        Type Parameters:
        T - the type of the strings in the given collection
        Parameters:
        ss - the collection to search
        t - the string to compare to
        Returns:
        a collection with similar strings