B - the type of the underlying n-best chunker being rescoredO - the type of the process language model for non-entitiesC - the type of the sequence language model for entitiespublic class AbstractCharLmRescoringChunker<B extends NBestChunker,O extends LanguageModel.Process,C extends LanguageModel.Sequence> extends RescoringChunker<B>
AbstractCharLmRescoringChunker provides the basic
character language-model rescoring model used by the trainable
CharLmRescoringChunker and its compiled version.
The exact model used is most easily described through an example. Consider the sentence John J. Smith lives in Washington. with John J. Smith as a person-type chunk and Washington as a location-type chunk. The probablity of this analysis derives from alternating chunk/non-chunk spans, starting and ending with non-chunk spans.
Note that the chunk modelsPOUT(CPER|CBOS) * PPER(John J. Smith) * POUT( lives in CLOC|CPER) * PLOC(Washington) * POUT(.CEOS|CLOC)
PPER and
PLOC are bounded models, and thus predict
the first letter given the fact that it's the first letter, and
also encodes an end-of-string probability to model the end. See
NGramBoundaryLM for more information on bounded models.
The non-chunk POUT model is a process
language model, but uses distinguished characters in much the same
way as the bounded models do internally. In particular, we have
distinguished characters for each type
(e.g. CPER), and for
begin-of-sentence and end-of-sentence markers
(e.g. CBOS). These must be chosen
so as not to conflict with any input characters in training or
decoding. With this encoding, the non-chunk model bears the brunt
of the burden in predicting types. To start, it conditions the
text it generates on the previous type, encoded as a character. To
end, it generates the next chunk type, also encoded as a character.
This allows the models to be sensitive to the fact that phrases
like lives in (including the spaces on either side) are
conditioned on following a person. The following chunk type,
location, is generated conditional on following
CPER lives in. The only constraints
on the length of these dependencies is the length of the n-gram
models (and the size of the chunk/non-chunk spans).
The resulting model generates a properly normalized probability distribution over chunkings.
The tag BOS is reserved for use by the system
for encoding document start/end positions. See HmmChunker
for more information.
| Constructor and Description |
|---|
AbstractCharLmRescoringChunker(B baseNBestChunker,
int numChunkingsRescored,
O outLM,
Map<String,Character> typeToChar,
Map<String,C> typeToLM)
Construct a rescoring chunker based on the specified underlying
chunker, with the specified number of underlying chunkings
rescored, based on the models and type encodings provided in
the last three arguments.
|
| Modifier and Type | Method and Description |
|---|---|
C |
chunkLM(String chunkType)
Returns the sequence language model for chunks of the
specified type.
|
O |
outLM()
Returns the process language model for non-chunks.
|
double |
rescore(Chunking chunking)
Performs rescoring of the base chunking output using
character language models.
|
char |
typeToChar(String chunkType)
Returns the character used to encode the specified type
in the model.
|
baseChunker, chunk, chunk, nBest, nBestChunks, numChunkingsRescored, setNumChunkingsRescoredpublic AbstractCharLmRescoringChunker(B baseNBestChunker, int numChunkingsRescored, O outLM, Map<String,Character> typeToChar, Map<String,C> typeToLM)
baseNBestChunker - Underlying chunker to rescore.numChunkingsRescored - Number of underlying chunkings
rescored by this chunker.outLM - The process language model for non-chunks.typeToChar - A mapping from chunk types to the characters that
encode them.typeToLM - A mapping from chunk types to the language
models used to model them.public char typeToChar(String chunkType)
chunkType - Type of chunk.IllegalArgumentException - If the specified chunk
type does not exist.public O outLM()
public C chunkLM(String chunkType)
chunkType - Type of chunk.public double rescore(Chunking chunking)
rescore in class RescoringChunker<B extends NBestChunker>chunking - Chunking being rescored.Copyright © 2019 Alias-i, Inc.. All rights reserved.