See: Description
| Interface | Description |
|---|---|
| Detokenizer |
A Detokenizer merges tokens back to their untokenized representation.
|
| TokenContextGenerator |
Interface for
TokenizerME context generators. |
| Tokenizer |
The interface for tokenizers, which segment a string into its tokens.
|
| TokenizerEvaluationMonitor |
| Class | Description |
|---|---|
| DefaultTokenContextGenerator |
Generate events for maxent decisions for tokenization.
|
| DetokenizationDictionary | |
| DictionaryDetokenizer |
A rule based detokenizer.
|
| SimpleTokenizer |
Performs tokenization using character classes.
|
| TokenizerCrossValidator | |
| TokenizerEvaluator |
The
TokenizerEvaluator measures the performance of
the given Tokenizer with the provided reference
TokenSamples. |
| TokenizerFactory |
The factory that provides
Tokenizer default implementations and
resources. |
| TokenizerME |
A Tokenizer for converting raw text into separated tokens.
|
| TokenizerModel |
The
TokenizerModel is the model used
by a learnable Tokenizer. |
| TokenizerStream |
The
TokenizerStream uses a tokenizer to tokenize the
input string and output TokenSamples. |
| TokenSample |
A
TokenSample is text with token spans. |
| TokenSampleStream |
This class is a stream filter which reads in string encoded samples and creates
TokenSamples out of them. |
| TokSpanEventStream |
This class reads the
TokenSamples from the given Iterator
and converts the TokenSamples into Events which
can be used by the maxent library for training. |
| WhitespaceTokenizer |
This tokenizer uses white spaces to tokenize the input text.
|
| WhitespaceTokenStream |
This stream formats a
TokenSamples into whitespace
separated token strings. |
| Enum | Description |
|---|---|
| DetokenizationDictionary.Operation | |
| Detokenizer.DetokenizationOperation |
This enum contains an operation for every token to merge the
tokens together to their detokenized form.
|
TokenizerME, the WhitespaceTokenizer and
the SimpleTokenizer which is a character class tokenizer.Copyright © 2017 The Apache Software Foundation. All rights reserved.