public class HmmCharLmEstimator extends AbstractHmmEstimator
HmmCharLmEstimator employs a maximum a posteriori
transition estimator and a bounded character language model
emission estimator.
The emission language models are instances NGramBoundaryLM. As such, they explicitly model start-of-token
(prefix) and end-of-token (suffix) and basic token-shape features.
The language model parameters are the usual ones: n-gram length,
interpolation ratio (controls amount of smoothing), and number of
characters (controls final smoothing).
The initial state and final state estimators are multinomial distributions, as is the conditional estimator of the next state given a previous state. The default behavior is to use maximum likelihood estimates with no smoothing for initial state, final state, and transition likelihoods in the model. That is, the estimated likelihood of a state being an initial state is proportional its training data frequency, with the actual likelihood being the training data frequency divided by the total training data frequency across tags.
With the constructor HmmCharLmEstimator(int,int,double,boolean), a flag may be
specified to use smoothing for states. The smoothing used is
add-one smoothing, also called Laplace smoothing. For each state,
it adds one to the count for that state being an initial state and
for that state being a final state. For each pair of states, it
adds one to the count of the transitions (including the self
transition, which is only counted once.) This smoothing is
equivalent to putting an alpha=1 uniform Dirichlet
prior on the initial state, final state, and conditional next
state estimators, with the resulting estimates being the maximum a
posteriori estimates.
In the real world, corpora are noisy or incomplete. As of
version 3.4.0, this estimator accepts taggings with
null categories. If a category is null,
its emission is not trained, nor are the transitions to it,
transitions from it, start states involving it, or end states
involving it.
The estimator will also accept inputs with null emissions. In the case of a null emission or null category, the emission model will not be trained for that particular token/category pair.
| Constructor and Description |
|---|
HmmCharLmEstimator()
Construct an HMM estimator with default parameter settings.
|
HmmCharLmEstimator(int charLmMaxNGram,
int maxCharacters,
double charLmInterpolation)
Construct an HMM estimator with the specified maximum character
n-gram size, maximum number of characters in the data, and
character n-gram interpolation parameter, with no state
smoothing.
|
HmmCharLmEstimator(int charLmMaxNGram,
int maxCharacters,
double charLmInterpolation,
boolean smootheStates)
Construct an HMM estimator with the specified maximum character
n-gram size, maximum number of characters in the data,
character n-gram interpolation parameter, and state
smoothing.
|
| Modifier and Type | Method and Description |
|---|---|
void |
compileTo(ObjectOutput objOut)
Compiles a copy of this estimated HMM to the specified object
output.
|
NGramBoundaryLM |
emissionLm(String state)
Returns the language model used for emission probabilities for
the specified state.
|
double |
emitLog2Prob(String state,
CharSequence emission)
Returns the log (base 2) of the emission estimate.
|
double |
emitProb(String state,
CharSequence emission)
Returns the estimate of the probability of the specified string
being emitted from the specified state.
|
double |
endProb(String state)
Returns the end probability for the specified state.
|
double |
startProb(String state)
Returns the start probability for the specified state.
|
void |
trainEmit(String state,
CharSequence emission)
Train the emission estimator with the specified training
instance consisting of a state and emission.
|
void |
trainEnd(String state)
Train the end state estimator with the specified end state.
|
void |
trainStart(String state)
Train the start state estimator with the specified start state.
|
void |
trainTransit(String sourceState,
String targetState)
Trains the transition estimator from the specified transition
from the specified source state to the specified target state.
|
double |
transitProb(String source,
String target)
Returns the transition estimate from the specified source state
to the specified target state.
|
handle, numTrainingCases, numTrainingTokensaddState, emitLog2Prob, emitProb, endLog2Prob, endLog2Prob, endProb, startLog2Prob, startLog2Prob, startProb, stateSymbolTable, transitLog2Prob, transitLog2Prob, transitProbpublic HmmCharLmEstimator()
6 for the maximum character
n-gram and 6.0, Character.MAX_VALUE-1 for
the maximum number of characters, 6.0 for the
character n-gram interpolation factor, and no state likelihood
smoothing.public HmmCharLmEstimator(int charLmMaxNGram,
int maxCharacters,
double charLmInterpolation)
NGramBoundaryLM.NGramBoundaryLM(int,int,double,char).charLmMaxNGram - Maximum n-gram for emission character
language models.maxCharacters - Maximum number of unique characters in
the training and test data.charLmInterpolation - Interpolation parameter for character
language models.IllegalArgumentException - If the max n-gram is less
than one, the max characters is less than 1 or greater than
Character.MAX_VALUE-1, or if the interpolation
parameter is negative or greater than 1.0.public HmmCharLmEstimator(int charLmMaxNGram,
int maxCharacters,
double charLmInterpolation,
boolean smootheStates)
NGramBoundaryLM.NGramBoundaryLM(int,int,double,char).
For information on state smoothing, see the class documentation
above.charLmMaxNGram - Maximum n-gram for emission character
language models.maxCharacters - Maximum number of unique characters in
the training and test data.charLmInterpolation - Interpolation parameter for character
language models.smootheStates - Flag indicating if add one smoothing is
carried out for HMM states.IllegalArgumentException - If the max n-gram is less
than one, the max characters is less than 1 or greater than
Character.MAX_VALUE-1, or if the interpolation
parameter is negative or greater than 1.0.public void trainStart(String state)
AbstractHmmEstimatortrainStart in class AbstractHmmEstimatorstate - State being trained.public void trainEnd(String state)
AbstractHmmEstimatortrainEnd in class AbstractHmmEstimatorstate - State being trained.public void trainEmit(String state, CharSequence emission)
AbstractHmmEstimatortrainEmit in class AbstractHmmEstimatorstate - State being trained.emission - Emission from state being trained.public void trainTransit(String sourceState, String targetState)
AbstractHmmEstimatortrainTransit in class AbstractHmmEstimatorsourceState - State from which the transition is made.targetState - State to which the transition is made.public double startProb(String state)
AbstractHmmstartProb in interface HiddenMarkovModelstartProb in class AbstractHmmstate - HMM state.public double endProb(String state)
AbstractHmmendProb in interface HiddenMarkovModelendProb in class AbstractHmmstate - HMM state.public double transitProb(String source, String target)
trainTransit(String,String), in
order to produce add-one smoothing. Typically, maximum
likelihood estimates of state transitions are fine for HMMs
trained with large sets of supervised data.transitProb in interface HiddenMarkovModeltransitProb in class AbstractHmmsource - Originating state for the transition.target - Resulting state after the transition.public double emitProb(String state, CharSequence emission)
emitProb in interface HiddenMarkovModelemitProb in class AbstractHmmstate - State of HMM.emission - String emitted by state.public double emitLog2Prob(String state, CharSequence emission)
AbstractHmmAbstractHmm.emitProb(String,CharSequence) for more information.
This method is implemented in terms of Math.log2(double) and AbstractHmm.emitProb(String,CharSequence).
emitLog2Prob in interface HiddenMarkovModelemitLog2Prob in class AbstractHmmstate - Label of state.emission - Character sequence emitted.public NGramBoundaryLM emissionLm(String state)
state - State of the HMM.public void compileTo(ObjectOutput objOut) throws IOException
AbstractHmmEstimatorHiddenMarkovModel, but will
most likely not be an instance of the same class as the object
being compiled.compileTo in interface CompilablecompileTo in class AbstractHmmEstimatorobjOut - Object output to which this estimator is
compiled.IOException - If there is an I/O exception compiling this
object.Copyright © 2019 Alias-i, Inc.. All rights reserved.