Class UnicodeTokenizer


  • public class UnicodeTokenizer
    extends java.lang.Object
    Tokenizes text according to Unicode word boundaries and strips off non-word characters.
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static java.lang.String[] tokenize​(java.lang.CharSequence text)
      Tokenizes the text and returns an array of tokens.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • UnicodeTokenizer

        public UnicodeTokenizer()
    • Method Detail

      • tokenize

        public static java.lang.String[] tokenize​(java.lang.CharSequence text)
        Tokenizes the text and returns an array of tokens.
        Parameters:
        text - The text
        Returns:
        The tokens