public class TokenChunker extends Object implements Chunker, Serializable
TokenChunker provides an implementationg of the Chunker interface based on an underlying tokenizer factory.
The chunkings produced will have one chunk per token produced by the underlying tokenizer factory, with start and end positions as determined by the tokenizer's start and end position methods. The type of the chunk will be the actual string yield of the token, which in the case of modifying tokenizers like stemmers, will not necessarily be the same as the underlying text span.
java.io.NotSerializableException. The
object read back in will be an instance of TokenChunker
constructed with the reconstituted tokenizer factory.| Constructor and Description |
|---|
TokenChunker(TokenizerFactory tokenizerFactory)
Construct a chunker from the specified tokenizer
factory.
|
| Modifier and Type | Method and Description |
|---|---|
Chunking |
chunk(char[] cs,
int start,
int end)
Return the chunking produced by tokenizing the specified
character array slice.
|
Chunking |
chunk(CharSequence cSeq)
Return the chunking produced by tokenizing the specified
character sequence.
|
TokenizerFactory |
tokenizerFactory()
Return the tokenizer factory for this token chunker.
|
public TokenChunker(TokenizerFactory tokenizerFactory)
tokenizerFactory - Tokenizer factory for this chunker.public TokenizerFactory tokenizerFactory()
public Chunking chunk(CharSequence cSeq)
public Chunking chunk(char[] cs, int start, int end)
Copyright © 2016 Alias-i, Inc.. All rights reserved.