| Package | Description |
|---|---|
| com.aliasi.chunk |
Classes for extracting meaningful chunks (spans) of text.
|
| com.aliasi.classify |
Classes for classifying data and evaluation.
|
| com.aliasi.cluster |
Classes for clustering data and evaluation.
|
| com.aliasi.corpus |
Classes for parsing and handling various corpora.
|
| com.aliasi.crf |
Classes and interfaces for conditional random fields.
|
| com.aliasi.features |
Classes for extracting feature vectors from objects and parsing
objects for feature handlers.
|
| com.aliasi.hmm |
Classes for estimating and decoding hidden Markov models.
|
| com.aliasi.lm |
Classes for character- and token-based language models.
|
| com.aliasi.sentences |
Classes for sentence-boundary detection.
|
| com.aliasi.spell |
Classes for spelling correction and edit distance.
|
| com.aliasi.stats |
Classes for handling basic statical distributions and estimators.
|
| com.aliasi.tag |
Classes and interfaces for sequence tagging, including evaluators.
|
| Modifier and Type | Class and Description |
|---|---|
class |
CharLmHmmChunker
A
CharLmHmmChunker employs a hidden Markov model
estimator and tokenizer factory to learn a chunker. |
class |
CharLmRescoringChunker
A
CharLmRescoringChunker provides a long-distance
character language model-based chunker that operates by rescoring
the output of a contained character language model HMM chunker. |
class |
ChunkerEvaluator
The
ChunkerEvaulator class provides an evaluation
framework for chunkers. |
class |
TrainTokenShapeChunker
A
TrainTokenShapeChunker is used to train a token and
shape-based chunker. |
| Modifier and Type | Method and Description |
|---|---|
static ObjectHandler<StringTagging> |
TagChunkCodecAdapters.chunkingToStringTagging(TagChunkCodec codec,
ObjectHandler<Chunking> handler)
Return the string tagging handler that converts string taggings
to chunkings.
|
static ObjectHandler<Tagging<String>> |
TagChunkCodecAdapters.chunkingToTagging(TagChunkCodec codec,
ObjectHandler<Chunking> handler)
Returns the tagging handler that converts taggings to chunkings
using the specified codec.
|
static ObjectHandler<Chunking> |
TagChunkCodecAdapters.stringTaggingToChunking(TagChunkCodec codec,
ObjectHandler<StringTagging> handler)
Return the chunking handler that converts chunkings to taggings
using the specified codec.
|
static ObjectHandler<Chunking> |
TagChunkCodecAdapters.taggingToChunking(TagChunkCodec codec,
ObjectHandler<Tagging<String>> handler)
Return the chunking handler that converts chunkings to simple
taggings using the specified codec.
|
| Modifier and Type | Method and Description |
|---|---|
static ObjectHandler<StringTagging> |
TagChunkCodecAdapters.chunkingToStringTagging(TagChunkCodec codec,
ObjectHandler<Chunking> handler)
Return the string tagging handler that converts string taggings
to chunkings.
|
static ObjectHandler<Tagging<String>> |
TagChunkCodecAdapters.chunkingToTagging(TagChunkCodec codec,
ObjectHandler<Chunking> handler)
Returns the tagging handler that converts taggings to chunkings
using the specified codec.
|
static ObjectHandler<Chunking> |
TagChunkCodecAdapters.stringTaggingToChunking(TagChunkCodec codec,
ObjectHandler<StringTagging> handler)
Return the chunking handler that converts chunkings to taggings
using the specified codec.
|
static ObjectHandler<Chunking> |
TagChunkCodecAdapters.taggingToChunking(TagChunkCodec codec,
ObjectHandler<Tagging<String>> handler)
Return the chunking handler that converts chunkings to simple
taggings using the specified codec.
|
| Modifier and Type | Class and Description |
|---|---|
class |
BaseClassifierEvaluator<E>
A
BaseClassifierEvaluator provides an evaluation harness
for first-best classifiers. |
class |
BernoulliClassifier<E>
A
BernoulliClassifier provides a feature-based
classifier where feature values are reduced to booleans based on a
specified threshold. |
class |
BinaryLMClassifier
A
BinaryLMClassifier is a boolean dynamic language
model classifier for use when there are two categories, but
training data is only available for one of the categories. |
class |
ConditionalClassifierEvaluator<E>
A
ConditionalClassifierEvaluator provides an evaluation
harness for conditional probability-based n-best classifiers. |
class |
DynamicLMClassifier<L extends LanguageModel.Dynamic>
A
DynamicLMClassifier is a language model classifier
that accepts training events of categorized character sequences. |
class |
JointClassifierEvaluator<E>
A
JointClassifierEvaluator provides an evaluation harness
for joint probability-based n-best classifiers. |
class |
KnnClassifier<E>
A
KnnClassifier implements k-nearest-neighor
classification based on feature extraction and a vector proximity
or distance. |
class |
NaiveBayesClassifier
A
NaiveBayesClassifier provides a trainable naive Bayes
text classifier, with tokens as features. |
class |
RankedClassifierEvaluator<E>
A
RankedClassifierEvaluator provides an evaluation harness for
ranked classifiers. |
class |
ScoredClassifierEvaluator<E>
A
ScoredClassifierEvaluator provides an evaluation harness for
score-based classifiers. |
class |
TfIdfClassifierTrainer<E>
A
TfIdfClassifierTrainer provides a framework for
training discriminative classifiers based on term-frequency (TF)
and inverse document frequency (IDF) weighting of features. |
class |
TradNaiveBayesClassifier
A
TradNaiveBayesClassifier implements a traditional
token-based approach to naive Bayes text classification. |
| Modifier and Type | Method and Description |
|---|---|
static <F> LogisticRegressionClassifier<F> |
LogisticRegressionClassifier.train(Corpus<ObjectHandler<Classified<F>>> corpus,
FeatureExtractor<? super F> featureExtractor,
int minFeatureCount,
boolean addInterceptFeature,
RegressionPrior prior,
int blockSize,
LogisticRegressionClassifier<F> hotStart,
AnnealingSchedule annealingSchedule,
double minImprovement,
int rollingAverageSize,
int minEpochs,
int maxEpochs,
ObjectHandler<LogisticRegressionClassifier<F>> classifierHandler,
Reporter reporter)
Returns a trained logistic regression classifier given the specified
feature extractor, training corpus, model priors and search parameters.
|
| Modifier and Type | Method and Description |
|---|---|
static Iterator<TradNaiveBayesClassifier> |
TradNaiveBayesClassifier.emIterator(TradNaiveBayesClassifier initialClassifier,
Factory<TradNaiveBayesClassifier> classifierFactory,
Corpus<ObjectHandler<Classified<CharSequence>>> labeledData,
Corpus<ObjectHandler<CharSequence>> unlabeledData,
double minTokenCount)
Apply the expectation maximization (EM) algorithm to train a traditional
naive Bayes classifier using the specified labeled and unabled data,
initial classifier and factory for creating subsequent factories.
|
static Iterator<TradNaiveBayesClassifier> |
TradNaiveBayesClassifier.emIterator(TradNaiveBayesClassifier initialClassifier,
Factory<TradNaiveBayesClassifier> classifierFactory,
Corpus<ObjectHandler<Classified<CharSequence>>> labeledData,
Corpus<ObjectHandler<CharSequence>> unlabeledData,
double minTokenCount)
Apply the expectation maximization (EM) algorithm to train a traditional
naive Bayes classifier using the specified labeled and unabled data,
initial classifier and factory for creating subsequent factories.
|
static TradNaiveBayesClassifier |
TradNaiveBayesClassifier.emTrain(TradNaiveBayesClassifier initialClassifier,
Factory<TradNaiveBayesClassifier> classifierFactory,
Corpus<ObjectHandler<Classified<CharSequence>>> labeledData,
Corpus<ObjectHandler<CharSequence>> unlabeledData,
double minTokenCount,
int maxEpochs,
double minImprovement,
Reporter reporter)
Apply the expectation maximization (EM) algorithm to train a traditional
naive Bayes classifier using the specified labeled and unabled data,
initial classifier and factory for creating subsequent factories,
maximum number of epochs, minimum improvement per epoch, and reporter
to which progress reports are sent.
|
static TradNaiveBayesClassifier |
TradNaiveBayesClassifier.emTrain(TradNaiveBayesClassifier initialClassifier,
Factory<TradNaiveBayesClassifier> classifierFactory,
Corpus<ObjectHandler<Classified<CharSequence>>> labeledData,
Corpus<ObjectHandler<CharSequence>> unlabeledData,
double minTokenCount,
int maxEpochs,
double minImprovement,
Reporter reporter)
Apply the expectation maximization (EM) algorithm to train a traditional
naive Bayes classifier using the specified labeled and unabled data,
initial classifier and factory for creating subsequent factories,
maximum number of epochs, minimum improvement per epoch, and reporter
to which progress reports are sent.
|
static <F> LogisticRegressionClassifier<F> |
LogisticRegressionClassifier.train(Corpus<ObjectHandler<Classified<F>>> corpus,
FeatureExtractor<? super F> featureExtractor,
int minFeatureCount,
boolean addInterceptFeature,
RegressionPrior prior,
AnnealingSchedule annealingSchedule,
double minImprovement,
int minEpochs,
int maxEpochs,
Reporter reporter)
Returns a trained logistic regression classifier given the specified
feature extractor, training corpus, model priors and search parameters.
|
static <F> LogisticRegressionClassifier<F> |
LogisticRegressionClassifier.train(Corpus<ObjectHandler<Classified<F>>> corpus,
FeatureExtractor<? super F> featureExtractor,
int minFeatureCount,
boolean addInterceptFeature,
RegressionPrior prior,
int blockSize,
LogisticRegressionClassifier<F> hotStart,
AnnealingSchedule annealingSchedule,
double minImprovement,
int rollingAverageSize,
int minEpochs,
int maxEpochs,
ObjectHandler<LogisticRegressionClassifier<F>> classifierHandler,
Reporter reporter)
Returns a trained logistic regression classifier given the specified
feature extractor, training corpus, model priors and search parameters.
|
| Constructor and Description |
|---|
PerceptronClassifier(Corpus<ObjectHandler<Classified<E>>> corpus,
FeatureExtractor<? super E> featureExtractor,
KernelFunction kernelFunction,
String corpusAcceptCategory,
int numIterations,
String outputAcceptCategory,
String outputRejectCategory)
Construct a perceptron classifier from the specified feature extractor,
corpus with designated accept category, polynomial kernel degree and
number of training iterations, and output accept and reject categories.
|
| Modifier and Type | Method and Description |
|---|---|
static LatentDirichletAllocation.GibbsSample |
LatentDirichletAllocation.gibbsSampler(int[][] docWords,
short numTopics,
double docTopicPrior,
double topicWordPrior,
int burninEpochs,
int sampleLag,
int numSamples,
Random random,
ObjectHandler<LatentDirichletAllocation.GibbsSample> handler)
Run Gibbs sampling for the specified multinomial data, number
of topics, priors, search parameters, randomization and
callback sample handler.
|
| Modifier and Type | Class and Description |
|---|---|
class |
XValidatingObjectCorpus<E>
An
XValidatingObjectCorpus holds a list of items
which it uses to provide training and testing items using
cross-validation. |
| Modifier and Type | Method and Description |
|---|---|
void |
XValidatingObjectCorpus.visitCorpus(ObjectHandler<E> handler) |
void |
XValidatingObjectCorpus.visitCorpus(ObjectHandler<E> trainHandler,
ObjectHandler<E> testHandler) |
void |
XValidatingObjectCorpus.visitCorpus(ObjectHandler<E> trainHandler,
ObjectHandler<E> testHandler) |
void |
XValidatingObjectCorpus.visitTest(ObjectHandler<E> handler)
Send all of the test items to the specified
handler.
|
void |
ListCorpus.visitTest(ObjectHandler<E> handler) |
void |
XValidatingObjectCorpus.visitTest(ObjectHandler<E> handler,
int fold)
Visit the test portion of the specified fold with the
specified handler.
|
void |
XValidatingObjectCorpus.visitTrain(ObjectHandler<E> handler)
Send all of the training items to the specified
handler.
|
void |
ListCorpus.visitTrain(ObjectHandler<E> handler) |
void |
XValidatingObjectCorpus.visitTrain(ObjectHandler<E> handler,
int fold)
Visit the training portion of the specified fold with the
specified handler.
|
| Modifier and Type | Method and Description |
|---|---|
static ChainCrfChunker |
ChainCrfChunker.estimate(Corpus<ObjectHandler<Chunking>> chunkingCorpus,
TagChunkCodec codec,
TokenizerFactory tokenizerFactory,
ChainCrfFeatureExtractor<String> featureExtractor,
boolean addInterceptFeature,
int minFeatureCount,
boolean cacheFeatureVectors,
RegressionPrior prior,
int priorBlockSize,
AnnealingSchedule annealingSchedule,
double minImprovement,
int minEpochs,
int maxEpochs,
Reporter reporter)
Return the chain CRF-based chunker estimated from the specified
corpus, which is converted to a tagging corpus using the
specified coder/decoder and tokenizer factory, then passed to
the chain CRF estimate method along with the rest of the
arguments.
|
static <F> ChainCrf<F> |
ChainCrf.estimate(Corpus<ObjectHandler<Tagging<F>>> corpus,
ChainCrfFeatureExtractor<F> featureExtractor,
boolean addInterceptFeature,
int minFeatureCount,
boolean cacheFeatureVectors,
boolean allowUnseenTransitions,
RegressionPrior prior,
int priorBlockSize,
AnnealingSchedule annealingSchedule,
double minImprovement,
int minEpochs,
int maxEpochs,
Reporter reporter)
Return the CRF estimated using stochastic gradient descent with
the specified prior from the specified corpus of taggings of
type
F pruned to the specified minimum feature count,
using the specified feature extractor, automatically adding an
intercept feature if the flag is true, allow unseen tag
transitions as specified, using the specified training
parameters for annealing, measuring convergence, and reporting
the incremental results to the specified reporter. |
| Constructor and Description |
|---|
ZScoreFeatureExtractor(Corpus<ObjectHandler<Classified<E>>> corpus,
FeatureExtractor<? super E> extractor)
Construct a z-core feature extractor from the specified base
feature extractor and the training section of the supplied
corpus.
|
| Modifier and Type | Class and Description |
|---|---|
class |
AbstractHmmEstimator
An
HmmEstimator may be used to train a hidden Markov
model (HMM). |
class |
HmmCharLmEstimator
An
HmmCharLmEstimator employs a maximum a posteriori
transition estimator and a bounded character language model
emission estimator. |
| Modifier and Type | Interface and Description |
|---|---|
static interface |
LanguageModel.Dynamic
A
LanguageModel.Dynamic accepts training events in
the form of character slices or sequences. |
| Modifier and Type | Class and Description |
|---|---|
class |
NGramBoundaryLM
An
NGramBoundaryLM provides a dynamic sequence
language model for which training, estimation and pruning may be
interleaved. |
class |
NGramProcessLM
An
NGramProcessLM provides a dynamic conditional
process language model process for which training, estimation, and
pruning may be interleaved. |
class |
TokenizedLM
A
TokenizedLM provides a dynamic sequence language
model which models token sequences with an n-gram model, and
whitespace and unknown tokens with their own sequence language
models. |
class |
UniformBoundaryLM
A
UniformBoundaryLM implements a uniform sequence
language model with a specified number of outcomes and the same
probability assigned to the end-of-stream marker. |
class |
UniformProcessLM
A
UniformLM.Sequence implements a uniform sequence
language model with a specified number of outcomes and the same
probability assigned to the end-of-stream marker. |
| Modifier and Type | Method and Description |
|---|---|
void |
TrieIntSeqCounter.handleNGrams(int nGram,
int minCount,
ObjectHandler<int[]> handler)
Supplies each n-gram of the specified length and with greater
than or equal to the specified minimum count to the specified
handler.
|
void |
TokenizedLM.handleNGrams(int nGramLength,
int minCount,
ObjectHandler<String[]> handler)
Visits the n-grams of the specified length with at least the specified
minimum count stored in the underlying counter of this
tokenized language model and passes them to the specified handler.
|
| Modifier and Type | Class and Description |
|---|---|
class |
SentenceEvaluator
A
SentenceEvaluator handles reference chunkings by
constructing a response chunking and adding them to a sentence
evaluation. |
| Modifier and Type | Class and Description |
|---|---|
class |
TfIdfDistance
The
TfIdfDistance class provides a string distance
based on term frequency (TF) and inverse document frequency (IDF). |
class |
TrainSpellChecker
A
TrainSpellChecker instance provides a mechanism for
collecting training data for a compiled spell checker. |
| Modifier and Type | Method and Description |
|---|---|
static LogisticRegression |
LogisticRegression.estimate(Vector[] xs,
int[] cs,
RegressionPrior prior,
int blockSize,
LogisticRegression hotStart,
AnnealingSchedule annealingSchedule,
double minImprovement,
int rollingAverageSize,
int minEpochs,
int maxEpochs,
ObjectHandler<LogisticRegression> handler,
Reporter reporter)
Estimate a logistic regression model from the specified input
data using the specified Gaussian prior, initial learning rate
and annealing rate, the minimum improvement per epoch, the
minimum and maximum number of estimation epochs, and a
reporter.
|
static LogisticRegression |
LogisticRegression.estimate(Vector[] xs,
Vector[] cs,
RegressionPrior prior,
int blockSize,
LogisticRegression hotStart,
AnnealingSchedule annealingSchedule,
double minImprovement,
int rollingAverageSize,
int minEpochs,
int maxEpochs,
ObjectHandler<LogisticRegression> handler,
Reporter reporter)
Estimate a logistic regression model from the specified input
data using the specified Gaussian prior, initial learning rate
and annealing rate, the minimum improvement per epoch, the
minimum and maximum number of estimation epochs, and a
reporter.
|
| Modifier and Type | Class and Description |
|---|---|
class |
MarginalTaggerEvaluator<E>
A
MarginalTaggerEvaluator evaluates marginal taggers either
directly or by adding their outputs. |
class |
NBestTaggerEvaluator<E>
An
NBestTaggerEvaluator provides an evaluation
framework for n-best taggers. |
class |
TaggerEvaluator<E>
A
TaggerEvaluator provides evaluation for
first-best taggers implementing the Tagger interface. |
| Modifier and Type | Method and Description |
|---|---|
static <F> Corpus<ObjectHandler<Classified<ClassifierTagger.State<F>>>> |
ClassifierTagger.toClassifiedCorpus(Corpus<ObjectHandler<Tagging<F>>> taggingCorpus)
Return a corpus consisting of classified tagger states derived
from the specified corpus of taggings.
|
| Modifier and Type | Method and Description |
|---|---|
static <F> Corpus<ObjectHandler<Classified<ClassifierTagger.State<F>>>> |
ClassifierTagger.toClassifiedCorpus(Corpus<ObjectHandler<Tagging<F>>> taggingCorpus)
Return a corpus consisting of classified tagger states derived
from the specified corpus of taggings.
|
Copyright © 2019 Alias-i, Inc.. All rights reserved.