public interface LanguageModel
LanguageModel provides an estimate of the probability of a
sequence of characters. Sequences of characters may be specified
via an array slice or with a Java CharSequence, which is an
interface implemented by String, StringBuilder and
the new I/O buffer class CharBuffer.
There are several subinterfaces of language model. The primary
distinction is between LanguageModel.Sequence
and LanguageModel.Process, which place different normalization
requirements on their estimates. Sequence models require the sum
of the estimates to be 1.0 over all character sequences, whereas a
process requires for each length that the sum of estimates to be
1.0 over all sequences of that length. Every language model should
be marked by one of these two sub-interfaces.
The LanguageModel.Conditional interface provides additional methods
for conditional estimates. The LanguageModel.Dynamic interface provides
a method for training the model with sample character sequence
data. Finally, several of the language model implementations are
serializable to an object output stream.
| Modifier and Type | Interface and Description |
|---|---|
static interface |
LanguageModel.Conditional
A
LanguageModel.Conditional is a language model
that implements conditional estimates of characters given
previous characters. |
static interface |
LanguageModel.Dynamic
A
LanguageModel.Dynamic accepts training events in
the form of character slices or sequences. |
static interface |
LanguageModel.Process
A
LanguageModel.Process is normalized by length. |
static interface |
LanguageModel.Sequence
A
LanguageModel.Sequence is normalized over all
character sequences. |
static interface |
LanguageModel.Tokenized
A
LanguageModel.Tokenized provides a means of
estimating the probability of a sequence of tokens. |
| Modifier and Type | Method and Description |
|---|---|
double |
log2Estimate(char[] cs,
int start,
int end)
Returns an estimate of the log (base 2) probability of the
specified character slice.
|
double |
log2Estimate(CharSequence cs)
Returns an estimate of the log (base 2) probability of the
specified character sequence.
|
double log2Estimate(char[] cs,
int start,
int end)
cs - Underlying array of characters.start - Index of first character in slice.end - One plus index of last character in slice.IndexOutOfBoundsException - If the start and end minus
one points are outside of the bounds of the character array.double log2Estimate(CharSequence cs)
cs - Character sequence to estimate.Copyright © 2019 Alias-i, Inc.. All rights reserved.