| Package | Description |
|---|---|
| com.aliasi.chunk |
Classes for extracting meaningful chunks (spans) of text.
|
| com.aliasi.lm |
Classes for character- and token-based language models.
|
| Modifier and Type | Class and Description |
|---|---|
class |
AbstractCharLmRescoringChunker<B extends NBestChunker,O extends LanguageModel.Process,C extends LanguageModel.Sequence>
An
AbstractCharLmRescoringChunker provides the basic
character language-model rescoring model used by the trainable
CharLmRescoringChunker and its compiled version. |
| Modifier and Type | Class and Description |
|---|---|
class |
CompiledNGramBoundaryLM
A
CompiledNGramBoundaryLM is constructed by reading
the serialized form of an instance of NGramBoundaryLM. |
class |
CompiledTokenizedLM
A
CompiledTokenizedLM implements a tokenized bounded
sequence language model. |
class |
NGramBoundaryLM
An
NGramBoundaryLM provides a dynamic sequence
language model for which training, estimation and pruning may be
interleaved. |
class |
TokenizedLM
A
TokenizedLM provides a dynamic sequence language
model which models token sequences with an n-gram model, and
whitespace and unknown tokens with their own sequence language
models. |
class |
UniformBoundaryLM
A
UniformBoundaryLM implements a uniform sequence
language model with a specified number of outcomes and the same
probability assigned to the end-of-stream marker. |
| Modifier and Type | Method and Description |
|---|---|
LanguageModel.Sequence |
TokenizedLM.unknownTokenLM()
Returns the unknown token seqeunce language model for this
tokenized language model.
|
LanguageModel.Sequence |
TokenizedLM.whitespaceLM()
Returns the whitespace language model for this tokenized
language model.
|
| Constructor and Description |
|---|
TokenizedLM(TokenizerFactory tokenizerFactory,
int nGramOrder,
LanguageModel.Sequence unknownTokenModel,
LanguageModel.Sequence whitespaceModel,
double lambdaFactor)
Construct a tokenized language model with the specified
tokenization factory and n-gram order, sequence models for
unknown tokens and whitespace, and an interpolation
hyperparameter.
|
TokenizedLM(TokenizerFactory tokenizerFactory,
int nGramOrder,
LanguageModel.Sequence unknownTokenModel,
LanguageModel.Sequence whitespaceModel,
double lambdaFactor,
boolean initialIncrementBoundary)
Construct a tokenized language model with the specified
tokenization factory and n-gram order, sequence models for
unknown tokens and whitespace, and an interpolation
hyperparameter, as well as a flag indicating whether to
automatically increment a null input to avoid numerical
problems with zero counts.
|
Copyright © 2019 Alias-i, Inc.. All rights reserved.