public class TokenShapeChunker extends Object implements Chunker
TokenShapeChunker uses a named-entity
TokenShapeDecoder and tokenizer factory to implement
entity detection through the chunk.Chunker interface.
A named-entity chunker is constructed from a tokenizer factory and
decoder. The tokenizer factory creates the tokens that are sent to
the decoder. The chunks have types derived from the named-entity
types found.
The tokens and whitespaces returned by the tokenizer are concatenated to form the underlying text slice of the chunks returned by the chunker. Thus a tokenizer like the stop list tokenizer or Porter stemmer tokenizer will create a character slice that does not match the input. A whitespace-normalizing tokenizer filter can be used, for example, to produce normalized text for the basis of the chunks.
| Modifier and Type | Method and Description |
|---|---|
Chunking |
chunk(char[] cs,
int start,
int end)
Return the set of named-entity chunks derived from the
underlying decoder over the tokenization of the specified
character slice.
|
Chunking |
chunk(CharSequence cSeq)
Return the set of named-entity chunks derived from the
uderlying decoder over the tokenization of the specified
character sequence.
|
void |
setLog2Beam(double beamWidth)
Sets the log (base 2) beam width for the decoder.
|
public Chunking chunk(CharSequence cSeq)
For more information on return results, see chunk(char[],int,int).
public Chunking chunk(char[] cs, int start, int end)
public void setLog2Beam(double beamWidth)
beamWidth - Width of beam.IllegalArgumentException - If the beam width is not
positive.Copyright © 2016 Alias-i, Inc.. All rights reserved.