Package de.julielab.genemapper.filtering
Class StringHelper
- java.lang.Object
-
- de.julielab.genemapper.filtering.StringHelper
-
public class StringHelper extends Object
This class offers some methods useful when working with (arrays of) strings.String comparison
- stringLengthComparatorAscending - for use in Collections.sort(Comparator)
- stringLengthComparatorDescending - for use in Collections.sort(Comparator)
- equalsIgnorePlural(String, String)
Morphology
- toSingularForm(String) - returns the singular form of a word, if applicable
- toSingularForms(String[]) - returns the singular forms for an array of word, if applicable
- equalsIgnorePlural(String, String)
Orthography
To sort into categories:
- espaceString(String) - espace regular expression meta-characters (precede by \\)
- reduceToFirstCharacters(String[], int) - reduces each element in an array to its suffix of specified length
- concatenate(String[], String[]) = concatenates two string arrays
- splitAndSort - tokenize, sort alphabetically, then de-tokenize again
- removeStrings
- removeStringsIgnoreCase
- joinStringArray
- joinStringList
- joinStringSet
- joinIntegerSet
- tokenizeToLowerCase
- Author:
- Joerg Hakenberg, Conrad Plake
-
-
Field Summary
Fields Modifier and Type Field Description static String[]stopWordsstop wordsstatic Comparator<String>stringLengthComparatorAscendingA comparator that sorts strings by length, in ascending order (shorest first).static Comparator<String>stringLengthComparatorDescendingA comparator that sorts strings by length, in descending order (longest first).
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static String[]concatenate(String[] array1, String[] array2)Concatenates to string arrays into one.
The new array will have length array1.length + array2.length and contain, from elements 0 to array1.length-1, the elements of the original array1 first, then, starting from array1.length, the elements of original array2.static booleanequalsIgnorePlural(String one, String two)Simple method for testing if two string are equal even if they vary in an ending 's'.static StringescapeString(String string)Escapes certain characters in a string so that it can be used inside a regular expression.static StringjoinIntegerSet(Set<Integer> set, String delimiter)Puts all elements of a set into a single delimited string.static StringjoinStringArray(String[] array, String delimiter)Joins the entries in a String array into a single String.static StringjoinStringList(List<String> list, String delimiter)Joins the entries in a String list into a single String.static StringjoinStringSet(Set<String> set, String delimiter)Puts all elements of a set into a single delimited string.static voidmain(String[] args)static String[]reduceToFirstCharacters(String[] array, int suffixLength)Goes through an array of strings and reduces every element to its first suffixLength characters.static String[]removeStrings(String[] strings, String[] unwantedStrings)Removes all tokens in strings that are also in unwantedStrings.static String[]removeStringsIgnoreCase(String[] strings, String[] unwantedStrings)Removes all tokens in strings that are also in unwantedStrings.static StringsplitAndSort(String string)Splits a string into tokens at whitespaces and sorts them alphabetically.static String[]tokenizeToLowerCase(String text)Splits a given text into tokens, all transformed into lower case.static StringtoSingularForm(String assertedNoun)Checks a word, assertedNoun for a plural form.static String[]toSingularForms(String[] array)Checks an array of words for plural forms.
-
-
-
Field Detail
-
stringLengthComparatorAscending
public static Comparator<String> stringLengthComparatorAscending
A comparator that sorts strings by length, in ascending order (shorest first).
-
stringLengthComparatorDescending
public static Comparator<String> stringLengthComparatorDescending
A comparator that sorts strings by length, in descending order (longest first).
-
stopWords
public static String[] stopWords
stop words
-
-
Method Detail
-
escapeString
public static String escapeString(String string)
Escapes certain characters in a string so that it can be used inside a regular expression. Inserts \\ before any brackets, '*', '+', and '-'.- Parameters:
string-- Returns:
-
reduceToFirstCharacters
public static String[] reduceToFirstCharacters(String[] array, int suffixLength)
Goes through an array of strings and reduces every element to its first suffixLength characters. Strings smaller than first remain intact.
reduceToFirstCharacters({"foo","bar","n"}, 2) would result in {"fo","ba","n"}.- Parameters:
array-suffixLength-- Returns:
-
concatenate
public static String[] concatenate(String[] array1, String[] array2)
Concatenates to string arrays into one.
The new array will have length array1.length + array2.length and contain, from elements 0 to array1.length-1, the elements of the original array1 first, then, starting from array1.length, the elements of original array2.- Parameters:
array1-array2-- Returns:
-
toSingularForm
public static String toSingularForm(String assertedNoun)
Checks a word, assertedNoun for a plural form. Replaces it with its singular form if applicable.
No word category check---will change all words, not only nouns, thus affects verbs as well (encodes=>encode)!- Parameters:
noun-- Returns:
-
toSingularForms
public static String[] toSingularForms(String[] array)
Checks an array of words for plural forms. Replaces every plural form with its singular form.
No word category check---will change all words, not only nouns, thus affects verbs as well (encodes=>encode)!- Parameters:
array-- Returns:
-
equalsIgnorePlural
public static boolean equalsIgnorePlural(String one, String two)
Simple method for testing if two string are equal even if they vary in an ending 's'.
-
splitAndSort
public static String splitAndSort(String string)
Splits a string into tokens at whitespaces and sorts them alphabetically. All tokens are then again glued together to a single string separated by whitespaces.- Parameters:
string-- Returns:
-
removeStrings
public static String[] removeStrings(String[] strings, String[] unwantedStrings)
Removes all tokens in strings that are also in unwantedStrings. Could be used as a stop word filter.- Parameters:
strings-unwantedStrings-- Returns:
- String[]
-
removeStringsIgnoreCase
public static String[] removeStringsIgnoreCase(String[] strings, String[] unwantedStrings)
Removes all tokens in strings that are also in unwantedStrings. Could be used as a stop word filter.- Parameters:
strings-unwantedStrings-- Returns:
- String[]
-
joinStringArray
public static String joinStringArray(String[] array, String delimiter)
Joins the entries in a String array into a single String. All entries get separated by the specified delimiter.- Parameters:
array-delimiter-- Returns:
- joined array
-
joinStringList
public static String joinStringList(List<String> list, String delimiter)
Joins the entries in a String list into a single String. All entries get separated by the specified delimiter.- Parameters:
list-delimiter-- Returns:
- joined array
-
joinStringSet
public static String joinStringSet(Set<String> set, String delimiter)
Puts all elements of a set into a single delimited string. Note that elements in the set are unordered!
-
joinIntegerSet
public static String joinIntegerSet(Set<Integer> set, String delimiter)
Puts all elements of a set into a single delimited string. Note that elements in the set are unordered!
-
tokenizeToLowerCase
public static String[] tokenizeToLowerCase(String text)
Splits a given text into tokens, all transformed into lower case.- Parameters:
text-- Returns:
-
main
public static void main(String[] args)
-
-