Class StringHelper


  • public class StringHelper
    extends Object
    This class offers some methods useful when working with (arrays of) strings.

    String comparison

    • stringLengthComparatorAscending - for use in Collections.sort(Comparator)
    • stringLengthComparatorDescending - for use in Collections.sort(Comparator)
    • equalsIgnorePlural(String, String)

    Morphology

    • toSingularForm(String) - returns the singular form of a word, if applicable
    • toSingularForms(String[]) - returns the singular forms for an array of word, if applicable
    • equalsIgnorePlural(String, String)

    Orthography

    To sort into categories:

    • espaceString(String) - espace regular expression meta-characters (precede by \\)
    • reduceToFirstCharacters(String[], int) - reduces each element in an array to its suffix of specified length
    • concatenate(String[], String[]) = concatenates two string arrays
    • splitAndSort - tokenize, sort alphabetically, then de-tokenize again
    • removeStrings
    • removeStringsIgnoreCase
    • joinStringArray
    • joinStringList
    • joinStringSet
    • joinIntegerSet
    • tokenizeToLowerCase
    Author:
    Joerg Hakenberg, Conrad Plake
    • Field Detail

      • stringLengthComparatorAscending

        public static Comparator<String> stringLengthComparatorAscending
        A comparator that sorts strings by length, in ascending order (shorest first).
      • stringLengthComparatorDescending

        public static Comparator<String> stringLengthComparatorDescending
        A comparator that sorts strings by length, in descending order (longest first).
      • stopWords

        public static String[] stopWords
        stop words
    • Method Detail

      • escapeString

        public static String escapeString​(String string)
        Escapes certain characters in a string so that it can be used inside a regular expression. Inserts \\ before any brackets, '*', '+', and '-'.
        Parameters:
        string -
        Returns:
      • reduceToFirstCharacters

        public static String[] reduceToFirstCharacters​(String[] array,
                                                       int suffixLength)
        Goes through an array of strings and reduces every element to its first suffixLength characters. Strings smaller than first remain intact.

        reduceToFirstCharacters({"foo","bar","n"}, 2) would result in {"fo","ba","n"}.
        Parameters:
        array -
        suffixLength -
        Returns:
      • concatenate

        public static String[] concatenate​(String[] array1,
                                           String[] array2)
        Concatenates to string arrays into one.

        The new array will have length array1.length + array2.length and contain, from elements 0 to array1.length-1, the elements of the original array1 first, then, starting from array1.length, the elements of original array2.
        Parameters:
        array1 -
        array2 -
        Returns:
      • toSingularForm

        public static String toSingularForm​(String assertedNoun)
        Checks a word, assertedNoun for a plural form. Replaces it with its singular form if applicable.
        No word category check---will change all words, not only nouns, thus affects verbs as well (encodes=>encode)!
        Parameters:
        noun -
        Returns:
      • toSingularForms

        public static String[] toSingularForms​(String[] array)
        Checks an array of words for plural forms. Replaces every plural form with its singular form.
        No word category check---will change all words, not only nouns, thus affects verbs as well (encodes=>encode)!
        Parameters:
        array -
        Returns:
      • equalsIgnorePlural

        public static boolean equalsIgnorePlural​(String one,
                                                 String two)
        Simple method for testing if two string are equal even if they vary in an ending 's'.
      • splitAndSort

        public static String splitAndSort​(String string)
        Splits a string into tokens at whitespaces and sorts them alphabetically. All tokens are then again glued together to a single string separated by whitespaces.
        Parameters:
        string -
        Returns:
      • removeStrings

        public static String[] removeStrings​(String[] strings,
                                             String[] unwantedStrings)
        Removes all tokens in strings that are also in unwantedStrings. Could be used as a stop word filter.
        Parameters:
        strings -
        unwantedStrings -
        Returns:
        String[]
      • removeStringsIgnoreCase

        public static String[] removeStringsIgnoreCase​(String[] strings,
                                                       String[] unwantedStrings)
        Removes all tokens in strings that are also in unwantedStrings. Could be used as a stop word filter.
        Parameters:
        strings -
        unwantedStrings -
        Returns:
        String[]
      • joinStringArray

        public static String joinStringArray​(String[] array,
                                             String delimiter)
        Joins the entries in a String array into a single String. All entries get separated by the specified delimiter.
        Parameters:
        array -
        delimiter -
        Returns:
        joined array
      • joinStringList

        public static String joinStringList​(List<String> list,
                                            String delimiter)
        Joins the entries in a String list into a single String. All entries get separated by the specified delimiter.
        Parameters:
        list -
        delimiter -
        Returns:
        joined array
      • joinStringSet

        public static String joinStringSet​(Set<String> set,
                                           String delimiter)
        Puts all elements of a set into a single delimited string. Note that elements in the set are unordered!
      • joinIntegerSet

        public static String joinIntegerSet​(Set<Integer> set,
                                            String delimiter)
        Puts all elements of a set into a single delimited string. Note that elements in the set are unordered!
      • tokenizeToLowerCase

        public static String[] tokenizeToLowerCase​(String text)
        Splits a given text into tokens, all transformed into lower case.
        Parameters:
        text -
        Returns:
      • main

        public static void main​(String[] args)