Class BigramTokenizer

  • All Implemented Interfaces:
    Tokenizer

    public final class BigramTokenizer
    extends java.lang.Object
    implements Tokenizer
    Advanced tokenizer that lowercases, adds start and end tags, deduplicates tokens and builds bigrams.
    Author:
    thomas.jungblut
    • Constructor Summary

      Constructors 
      Constructor Description
      BigramTokenizer()  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      java.lang.String[] tokenize​(java.lang.String toTokenize)
      Tokenizes the given String to a array of Strings.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • BigramTokenizer

        public BigramTokenizer()
    • Method Detail

      • tokenize

        public java.lang.String[] tokenize​(java.lang.String toTokenize)
        Description copied from interface: Tokenizer
        Tokenizes the given String to a array of Strings.
        Specified by:
        tokenize in interface Tokenizer
        Parameters:
        toTokenize - the string to tokenize.
        Returns:
        the array of tokenized tokens.