|
|||||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | ||||||||
See:
Description
| Interface Summary | |
|---|---|
| Detokenizer | A Detokenizer merges tokens back to their untokenized representation. |
| TokenContextGenerator | Interface for TokenizerME context generators. |
| Tokenizer | The interface for tokenizers, which segment a string into its tokens. |
| TokenizerEvaluationMonitor | |
| Class Summary | |
|---|---|
| DefaultTokenContextGenerator | Generate events for maxent decisions for tokenization. |
| DetokenizationDictionary | |
| DictionaryDetokenizer | A rule based detokenizer. |
| SimpleTokenizer | Performs tokenization using character classes. |
| TokenizerCrossValidator | |
| TokenizerEvaluator | The TokenizerEvaluator measures the performance of
the given Tokenizer with the provided reference
TokenSamples. |
| TokenizerFactory | The factory that provides Tokenizer default implementations and
resources. |
| TokenizerME | A Tokenizer for converting raw text into separated tokens. |
| TokenizerModel | The TokenizerModel is the model used
by a learnable Tokenizer. |
| TokenizerStream | The TokenizerStream uses a tokenizer to tokenize the
input string and output TokenSamples. |
| TokenSample | A TokenSample is text with token spans. |
| TokenSampleStream | This class is a stream filter which reads in string encoded samples and creates
TokenSamples out of them. |
| TokSpanEventStream | This class reads the TokenSamples from the given Iterator
and converts the TokenSamples into Events which
can be used by the maxent library for training. |
| WhitespaceTokenizer | This tokenizer uses white spaces to tokenize the input text. |
| WhitespaceTokenStream | This stream formats a TokenSamples into whitespace
separated token strings. |
| Enum Summary | |
|---|---|
| DetokenizationDictionary.Operation | |
| Detokenizer.DetokenizationOperation | This enum contains an operation for every token to merge the tokens together to their detokenized form. |
Contains classes related to finding token or words in a string. All
tokenizer implement the Tokenizer interface. Currently there is the
learnable TokenizerME, the WhitespaceTokenizer and
the SimpleTokenizer which is a character class tokenizer.
|
|||||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | ||||||||