public class BinaryLMClassifier extends DynamicLMClassifier<LanguageModel.Dynamic>
BinaryLMClassifier is a boolean dynamic language
model classifier for use when there are two categories, but
training data is only available for one of the categories.
A binary LM classifier is based on a single language model and cross-entropy threshold. It defines two categories, accept and reject, with acceptance determined by measuring sample cross-entropy rate in a language model against a threshold. As a language model classifier, the multivariate category estimator is uniform, the accepting language model is dynamic, and the rejecting language model is constant.
As an instance of language model classifier, this class provides
scores that are adjusted per-character average log probabilities,
which are roughly negative sample cross-entropy rates (see LMClassifier). The accepting language model behaves in the usual
way. The rejecting language model provides a constant
per-character log estimate. The uniform rejecting model is defined
to be a boundary uniform lanuage model if the specified model is a
sequence language model and a process uniform language model
otherwise.
Training events may be supplied in the same way as for the
superclass DynamicLMClassifier, with two caveats. First,
the multivariate category model remains uniform and thus does not
contribute to classification. Second, training events for the
rejection category are ignored. Thus only the language model for
the accepting category is trained. The broader interface is
implemented without exceptions in order to allow binary classifiers
to be plugged in for ones with explicit rejection models.
Instances of this class are compilable as instances of their
superclass. The resulting object read back in will be an instance
of LMClassifier, not of this class, but its classification
behavior will be identical.
Resetting category language models is not allowed for binary language model classifiers, because they only contain one model and all else is constant.
Binary langauge model classifiers are concurrent-read and single-write thread safe. The only write operation is training the accepting category. Classification and compilation are reads. If the language model underlying this classifier is not thread safe, then reads may not be called concurrently.
| Modifier and Type | Field and Description |
|---|---|
static String |
DEFAULT_ACCEPT_CATEGORY
The default value of the category for accepting
input, "true".
|
static String |
DEFAULT_REJECT_CATEGORY
The default value of the category for rejecting input,
"false".
|
| Constructor and Description |
|---|
BinaryLMClassifier(LanguageModel.Dynamic acceptingLM,
double crossEntropyThreshold)
Construct a binary character sequence classifier that accepts
or rejects inputs based on their cross-entropy being above or
below a fixed cross-entropy threshold.
|
BinaryLMClassifier(LanguageModel.Dynamic acceptingLM,
double crossEntropyThreshold,
String acceptCategory,
String rejectCategory)
Construct a binary character sequence classifier that accepts
or rejects inputs based on their cross-entropy being above or
below a fixed cross-entropy threshold.
|
| Modifier and Type | Method and Description |
|---|---|
String |
acceptCategory()
Returns the category assigned to matching/accepted cases.
|
void |
handle(Classified<CharSequence> classified)
Train this classifier using the character sequence from the
specified classified object if the best category of the
classification is the accept category for this binary
classifier.
|
String |
rejectCategory()
Returns the category assigned to non-matching/rejected cases.
|
void |
resetCategory(String category,
LanguageModel.Dynamic lm,
int newCount)
Throws an
UnsupportedOperationException. |
compileTo, createNGramBoundary, createNGramProcess, createTokenized, traincategories, categoryDistribution, classify, classifyJoint, languageModelpublic static final String DEFAULT_ACCEPT_CATEGORY
public static final String DEFAULT_REJECT_CATEGORY
public BinaryLMClassifier(LanguageModel.Dynamic acceptingLM, double crossEntropyThreshold)
DEFAULT_ACCEPT_CATEGORY,
otherwise it will be DEFAULT_REJECT_CATEGORY. The
labels of the categories can be reversed in order to build a
rejector or changed altogether with the four-argument
constructor. See the class documentation for more information
on training, classification and compilation.acceptingLM - The language model that determines
whether an input is accepted or rejected.crossEntropyThreshold - Maximum cross-entropy against a
model to accept the input.public BinaryLMClassifier(LanguageModel.Dynamic acceptingLM, double crossEntropyThreshold, String acceptCategory, String rejectCategory)
acceptingLM - The language model that determines
whether an input is accepted or rejected.crossEntropyThreshold - Maximum cross-entropy against a
model to accept the input.acceptCategory - Category label for matching input.rejectCategory - Category label for rejecting input.public String acceptCategory()
public String rejectCategory()
public void handle(Classified<CharSequence> classified)
handle in interface ObjectHandler<Classified<CharSequence>>handle in class DynamicLMClassifier<LanguageModel.Dynamic>classified - Classified character sequence.IllegalArgumentException - If the best category in the
classification of the classified object is neither the accept
nor the reject category for this binary classifier.public void resetCategory(String category, LanguageModel.Dynamic lm, int newCount)
UnsupportedOperationException.resetCategory in class DynamicLMClassifier<LanguageModel.Dynamic>category - Ignored.lm - Ignored.newCount - Ignored.UnsupportedOperationException - Always.Copyright © 2019 Alias-i, Inc.. All rights reserved.