public class CharacterTokenizerFactory extends Object implements Serializable, TokenizerFactory
CharacterTokenizerFactory considers each
non-whitespace character in the input to be a distinct token. This
factory is useful for handling languages such as Chinese, which
includes thousands of characters and presents a difficult tokenization
problem for standard tokenizers.
Because the tokenizer factory is thread safe and immutable, the
recommended usage is through the static singleton instance INSTANCE.
INSTANCE.| Modifier and Type | Field and Description |
|---|---|
static TokenizerFactory |
INSTANCE
An instance of a character tokenizer factory, which may be used
wherever a character tokenizer factory is needed.
|
| Modifier and Type | Method and Description |
|---|---|
Tokenizer |
tokenizer(char[] ch,
int start,
int length)
Returns a character tokenizer for the specified character
array slice.
|
String |
toString()
Returns a string representation of this tokenizer factory,
which is just its name.
|
public static final TokenizerFactory INSTANCE
public Tokenizer tokenizer(char[] ch, int start, int length)
tokenizer in interface TokenizerFactorych - Characters to tokenize.start - Index of first character to tokenize.length - Number of characters to tokenize.Copyright © 2019 Alias-i, Inc.. All rights reserved.