public class UniformProcessLM extends Object implements LanguageModel.Dynamic, LanguageModel.Process
UniformLM.Sequence implements a uniform sequence
language model with a specified number of outcomes and the same
probability assigned to the end-of-stream marker. The formula
for computing sequence likelihood estimates is:
log2Estimate(cSeq) =
= log2 ( (cSeq.length()+1) / (numOutcomes+1) )
Adding one to the number of outcomes makes the end-of-sequence
just as likely as any other character. Adding one to the
sequence length adds the log likelihood of the end-of-sequence
marker itself.LanguageModel.Conditional, LanguageModel.Dynamic, LanguageModel.Process, LanguageModel.Sequence, LanguageModel.Tokenized| Constructor and Description |
|---|
UniformProcessLM()
Construct a uniform process language model with a
number of outcomes equal to the total number of
characters.
|
UniformProcessLM(double crossEntropyRate)
Construct a uniform process language model with the specified
character cross-entropy rate.
|
UniformProcessLM(int numOutcomes)
Construct a uniform process language model with the specified
number of outcomes.
|
| Modifier and Type | Method and Description |
|---|---|
void |
compileTo(ObjectOutput objOut)
Writes a compiled version of this model to the specified object
output.
|
void |
handle(CharSequence cs)
This method for training a character sequence is supplied
for compatibility with the dynamic language model interface,
but is implemented to do nothing.
|
double |
log2Estimate(char[] cs,
int start,
int end)
Returns an estimate of the log (base 2) probability of the
specified character slice.
|
double |
log2Estimate(CharSequence cSeq)
Returns an estimate of the log (base 2) probability of the
specified character sequence.
|
int |
numOutcomes()
Returns the number of outcomes for this uniform model.
|
void |
train(char[] cs,
int start,
int end)
Ignores the training data.
|
void |
train(char[] cs,
int start,
int end,
int count)
Ignores the training data.
|
void |
train(CharSequence cSeq)
Ignores the training data.
|
void |
train(CharSequence cSeq,
int count)
Ignores the training data.
|
public UniformProcessLM()
public UniformProcessLM(int numOutcomes)
1/numOutcomes.numOutcomes - The number of outcomes for this language
model.public UniformProcessLM(double crossEntropyRate)
log2 P(cs)
= - crossEntropyRate * cs.length()
The number of outcomes is set by rounding down the exponent
of the cross-entropy:
numOutcomes = (int) 2.0crossEntropyRate
crossEntropyRate - Character cross-entropy rate of the
uniform model.public int numOutcomes()
public void compileTo(ObjectOutput objOut) throws IOException
UniformProcessLM.compileTo in interface CompilableobjOut - Object output to which this model is written.IOException - If there is an I/O error during the write.public void handle(CharSequence cs)
handle in interface ObjectHandler<CharSequence>cs - Ignored.public void train(char[] cs,
int start,
int end)
train in interface LanguageModel.Dynamiccs - Ignored.start - Ignored.end - Ignored.public void train(char[] cs,
int start,
int end,
int count)
train in interface LanguageModel.Dynamiccs - Ignored.start - Ignored.end - Ignored.count - Ignored.public void train(CharSequence cSeq)
train in interface LanguageModel.DynamiccSeq - Ignored.public void train(CharSequence cSeq, int count)
train in interface LanguageModel.DynamiccSeq - Ignored.count - Ignored.public double log2Estimate(char[] cs,
int start,
int end)
LanguageModellog2Estimate in interface LanguageModelcs - Underlying array of characters.start - Index of first character in slice.end - One plus index of last character in slice.public double log2Estimate(CharSequence cSeq)
LanguageModellog2Estimate in interface LanguageModelcSeq - Character sequence to estimate.Copyright © 2019 Alias-i, Inc.. All rights reserved.