public class LineTokenizerFactory extends RegExTokenizerFactory
LineTokenizerFactory treats each line of an input as
a token. Whitespaces separating lines are simply newlines. This
is useful for decoders that work at the line level.
Line terminators are as defined in Pattern,
and include all of the Windows, Unix, and Macintosh standards, as well
as some unicode extensions.
Whitespaces will be either empty strings or strings representing one or more newlines.
Tokens may consist entirely of whitespace characters if whitespace is the only thing on a line. But tokens will never contain sequences representing newlines. Tokens will alwyas consist of at least one character.
Input String Tokens Whitespaces ""{}{ "" }"abc"{ "abc" }{ "", "" }"abc\ndef"{ "abc", "def" }{ "", "\n", "" }"abc\r\ndef"{ "abc", "def" }{ "", "\r\n", "" }" abc\n def \n"{ " abc", " def " }{ "", "\n", "\n" }" \n"{ " " }{ "", "\n" }
A line tokenizer factory may be serialized. Upon
deserialization, the resulting class will be the singleton
item INSTANCE.
This tokenizer factory is nothing more than a convenience
wrapper around a very simple RegExTokenizerFactory, with
the simplest possible regular expression:
RegExTokenizerFactory(".+")
Because the regular expression tokenizer factory takes the
default regular expression flags (see Pattern),
the period (.) matches any character except a newline.
| Modifier and Type | Field and Description |
|---|---|
static LineTokenizerFactory |
INSTANCE
A reusable instance of this class.
|
| Modifier and Type | Method and Description |
|---|---|
String |
toString()
Returns a string representation of this factory, consisting
of its name.
|
pattern, tokenizerpublic static final LineTokenizerFactory INSTANCE
public String toString()
toString in class RegExTokenizerFactoryCopyright © 2019 Alias-i, Inc.. All rights reserved.