B - the type of the underlying n-best chunkerpublic abstract class RescoringChunker<B extends NBestChunker> extends Object implements NBestChunker, ConfidenceChunker
RescoringChunker provides first best, n-best and
confidence chunking by rescoring n-best chunkings derived from a
contained chunker.
Concrete subclasses must implement the abstract method rescore(Chunking), which provides a score for a chunking. There
are no restrictions on how this score is computed; most typically,
it will be a longer-distance/higher-order model than the contained
chunker and provide more accurate results.
The n-best chunker works by generating the top analyses from the
contained chunker. The number of such analyses considered is
determined in the constructor for this class. These are then
placed in a bounded priority queue with the bound determined by the
maximum specified in the call to nBest(char[],int,int,int).
The first-best chunker methods chunk(CharSequence) and
chunk(char[],int,int) operate by choosing the top scoring
chunking from the rescoring of the contained chunker. The number
of chunkings from the contained chunker that are rescored is
determined in the constructor. This is more memory and time
efficient than running the n-best chunking.
nBestChunks(char[],int,int,int) method is implemented
by walking over the n-best analyses generated by nBest(char[],int,int,int) with a maximum n-best for full analyses
set to the value of numChunkingsRescored(), which may be
changed using setNumChunkingsRescored(int). For each
analysis, the chunks are pulled out and their weight is incremented
by the n-best analysis weight. Normalization is carried out by
dividing by the total probability mass in the returned n-best list.
baseChunker().| Constructor and Description |
|---|
RescoringChunker(B chunker,
int numChunkingsRescored)
Construct a rescoring chunker that contains the specified base
chunker and considers the specified number of chunkings for
rescoring.
|
| Modifier and Type | Method and Description |
|---|---|
B |
baseChunker()
The base chunker that generates hypotheses to rescore.
|
Chunking |
chunk(char[] cs,
int start,
int end)
Returns the first-best chunking for the specified character
slice.
|
Chunking |
chunk(CharSequence cSeq)
Returns the first-best chunking for the specified character
sequence.
|
Iterator<ScoredObject<Chunking>> |
nBest(char[] cs,
int start,
int end,
int maxNBest)
Returns the n-best chunkings of the specified character slice.
|
Iterator<Chunk> |
nBestChunks(char[] cs,
int start,
int end,
int maxNBest)
Returns the n-best chunks for the specified character slice up to
the specified maximum number of chunks.
|
int |
numChunkingsRescored()
Return the number of chunkings to generate from the base
chunker for rescoring.
|
abstract double |
rescore(Chunking chunking)
Returns the score for a chunking.
|
void |
setNumChunkingsRescored(int numChunkingsRescored)
Set the number of base chunkings to rescore.
|
public RescoringChunker(B chunker, int numChunkingsRescored)
chunker - Base n-best chunker.numChunkingsRescored - Number of chunkings generated
by the base chunker to rescore.public abstract double rescore(Chunking chunking)
The rescoring should be in the form of log (base 2) joint
probability estimate for the specified chunking. For the
simple whole-analysis rescoring method nBest(char[],int,int,int), this is not checked, and any
values may be used in practice. For the n-best chunk method
nBestChunks(char[],int,int,int), the scores are
treated as log probabilities, but renormalized in order to
compute conditional chunk probability estimates.
chunking - Chunking to rescore.public B baseChunker()
public int numChunkingsRescored()
public void setNumChunkingsRescored(int numChunkingsRescored)
numChunkingsRescored - Number of base chunkings to
rescore.public Chunking chunk(CharSequence cSeq)
public Chunking chunk(char[] cs, int start, int end)
public Iterator<ScoredObject<Chunking>> nBest(char[] cs, int start, int end, int maxNBest)
nBest in interface NBestChunkercs - Underlying character array.start - Index of first character to analyze.end - Index of one past the last character to analyze.maxNBest - The maximum number of results to return.npublic Iterator<Chunk> nBestChunks(char[] cs, int start, int end, int maxNBest)
See the class documentation above for implementation details.
nBestChunks in interface ConfidenceChunkercs - Underlying characters.start - Index of first character in slice.end - Index of one past last character in slice.maxNBest - Maximum number of chunks to return.Copyright © 2019 Alias-i, Inc.. All rights reserved.