E - the type of object being classifiedpublic class LogisticRegressionClassifier<E> extends Object implements ConditionalClassifier<E>, Compilable, Serializable
LogisticRegressionClassifier provides conditional
probability classifications of input objects using an underlying
logistic regression model and feature extractor. Logistic
regression is a discrimitive classifier which operates over
arbitrary feature vectors extracted from items. See LogisticRegression for a full definition of logistic regression
and its implementation.
Logistic regression classifiers may be trained from a data
corpus using the method train(Corpus,FeatureExtractor,int,boolean,RegressionPrior,AnnealingSchedule,double,int,int,Reporter),
the last six arguments of which are shared with the logistic
regression training method LogisticRegression.estimate(Vector[],int[],RegressionPrior,AnnealingSchedule,Reporter,double,int,int).
The first three arguments are required to adapt logistic regression
to general classification, and consist of a feature extractor, a
corpus to train over, and a boolean flag indicating whether or not
to add an intercept feature to every input vector.
This class merely acts as an adapter to implement the ConditionalClassifier interface based on the LogisticRegression class
in the statistics package. The basis of the adaptation is a
general feature extractor, which is an instance of FeatureExtractor. A feature extractor converts an arbitrary input
object (whose type is specified generically in this class) to a
mapping from features (represented as strings) to values
(represented as instances of Number). The class then uses
a symbol table for features to convert the maps from feature names
to numbers into sparse vectors, where the dimensions are the
identifiers for the features in the symbol table. By convention,
if the intercept feature flag is set, it will set dimension 0 of
all inputs to 1.0.
This class implements both Serializable and Compilable, but both do the same thing and simply write the
content of the model to the object output. The model read back in
will be an instance of LogisticRegressionClassifier with
the same components as the model that was serialized or compiled.
| Modifier and Type | Field and Description |
|---|---|
static String |
INTERCEPT_FEATURE_NAME
The name of the feature used for intercepts,
*&^INTERCEPT%$^&**. |
| Modifier and Type | Method and Description |
|---|---|
boolean |
addInterceptFeature()
Returns
true if this classifier automatically adds
an intercept feature to each feature vector. |
List<String> |
categorySymbols()
Returns a copy of the category symbols used by this classifier
in the same order as used by the underlying logistic regression
model.
|
ConditionalClassification |
classify(E in)
Return the conditional classification of the specified object
using logistic regression classification.
|
ConditionalClassification |
classifyFeatures(Map<String,? extends Number> featureMap)
Return the conditional classification of a feature map using
this classifier.
|
ConditionalClassification |
classifyVector(Vector v)
Returns the classification of the specified vector using the
logistic regression model underlying this classifier.
|
void |
compileTo(ObjectOutput objOut)
Compile this classifier to the specified object output.
|
FeatureExtractor<E> |
featureExtractor()
Returns an immutable view of the feature extractor for this
classifier.
|
SymbolTable |
featureSymbolTable()
Returns an unmodifiable view of the symbol table used for
features in this classifier.
|
ObjectToDoubleMap<String> |
featureValues(String category)
Returns a mapping from features to their parameter values for
the specified category.
|
LogisticRegression |
model()
Returns the logistic regression model underlying this
classifier.
|
String |
toString()
Returns a string-based representation of this classifier,
listing the parameter vectors for each category.
|
static <F> LogisticRegressionClassifier<F> |
train(Corpus<ObjectHandler<Classified<F>>> corpus,
FeatureExtractor<? super F> featureExtractor,
int minFeatureCount,
boolean addInterceptFeature,
RegressionPrior prior,
AnnealingSchedule annealingSchedule,
double minImprovement,
int minEpochs,
int maxEpochs,
Reporter reporter)
Returns a trained logistic regression classifier given the specified
feature extractor, training corpus, model priors and search parameters.
|
static <F> LogisticRegressionClassifier<F> |
train(Corpus<ObjectHandler<Classified<F>>> corpus,
FeatureExtractor<? super F> featureExtractor,
int minFeatureCount,
boolean addInterceptFeature,
RegressionPrior prior,
int blockSize,
LogisticRegressionClassifier<F> hotStart,
AnnealingSchedule annealingSchedule,
double minImprovement,
int rollingAverageSize,
int minEpochs,
int maxEpochs,
ObjectHandler<LogisticRegressionClassifier<F>> classifierHandler,
Reporter reporter)
Returns a trained logistic regression classifier given the specified
feature extractor, training corpus, model priors and search parameters.
|
public static final String INTERCEPT_FEATURE_NAME
*&^INTERCEPT%$^&**.public SymbolTable featureSymbolTable()
public List<String> categorySymbols()
public LogisticRegression model()
public boolean addInterceptFeature()
true if this classifier automatically adds
an intercept feature to each feature vector.public FeatureExtractor<E> featureExtractor()
Warning: If the feature extractor has side-effects (as, for example, the caching feature extractor does), these will be preserved by the returned result, which merely wraps the contained feature extractor in an anonymous inner feature extractor.
public ConditionalClassification classifyVector(Vector v)
v - Vector to classify.public ConditionalClassification classifyFeatures(Map<String,? extends Number> featureMap)
classify(Object) using the feature
symbol table featureSymbolTable() and the flag addInterceptFeature().featureMap - the feature vector to classify.public ConditionalClassification classify(E in)
classify in interface BaseClassifier<E>classify in interface ConditionalClassifier<E>classify in interface RankedClassifier<E>classify in interface ScoredClassifier<E>in - Input object to classify.public void compileTo(ObjectOutput objOut) throws IOException
Object.equals() sense).
Serializing this class produces exactly the same output.
compileTo in interface CompilableobjOut - Object output to which this classifier is
written.IOException - If there is an underlying I/O error
writing the model to the stream.public ObjectToDoubleMap<String> featureValues(String category)
category - Classification category.IllegalArgumentException - If the category is unknown.public String toString()
public static <F> LogisticRegressionClassifier<F> train(Corpus<ObjectHandler<Classified<F>>> corpus, FeatureExtractor<? super F> featureExtractor, int minFeatureCount, boolean addInterceptFeature, RegressionPrior prior, AnnealingSchedule annealingSchedule, double minImprovement, int minEpochs, int maxEpochs, Reporter reporter) throws IOException
Only the training section of the specified corpus is used for training.
See the class documentation above and the class
documentation for LogisticRegression for more
information on the parameters.
The block size defaults to the corpus training size divided by 50.
F - the type of object to be classifiedcorpus - Corpus of training data.featureExtractor - Converter from objects to feature maps.minFeatureCount - Minimum count for features in corpus to
keep feature as part of model.addInterceptFeature - A flag set to true if
an intercept feature should be added to each input vector.prior - The prior for regularization of the regression.annealingSchedule - Class to compute learning rate for each epoch.minImprovement - Minimum relative improvement in error during
an epoch to stop search.minEpochs - Minimum number of search epochs.maxEpochs - Maximum number of epochs.reporter - Reporter to which progress reports are written,
or null for no reporting.IOException - If there is an underlying I/O exception
reading the data from the corpus.public static <F> LogisticRegressionClassifier<F> train(Corpus<ObjectHandler<Classified<F>>> corpus, FeatureExtractor<? super F> featureExtractor, int minFeatureCount, boolean addInterceptFeature, RegressionPrior prior, int blockSize, LogisticRegressionClassifier<F> hotStart, AnnealingSchedule annealingSchedule, double minImprovement, int rollingAverageSize, int minEpochs, int maxEpochs, ObjectHandler<LogisticRegressionClassifier<F>> classifierHandler, Reporter reporter) throws IOException
Only the training section of the specified corpus is used for training.
See the class documentation above and the class
documentation for LogisticRegression for more
information on the parameters.
F - the type of object to be classifiedcorpus - Corpus of training data.featureExtractor - Converter from objects to feature maps.minFeatureCount - Minimum count for features in corpus to
keep feature as part of model.addInterceptFeature - A flag set to true if
an intercept feature should be added to each input vector.prior - The prior for regularization of the regression.blockSize - Number of examples whose probabilities are computed
before applying a gradient update.hotStart - Logistic regression classifier to use as initial
coefficient values for training.annealingSchedule - Class to compute learning rate for each epoch.minImprovement - Minimum relative improvement in error during
an epoch to stop search.rollingAverageSize - Number of epochs over which to
average objective improvement for monitoring convergence.minEpochs - Minimum number of search epochs.maxEpochs - Maximum number of epochs.classifierHandler - Handler for classifiers produced at each
epoch.reporter - Reporter to which progress reports are written,
or null for no reporting.IOException - If there is an underlying I/O exception
reading the data from the corpus.Copyright © 2016 Alias-i, Inc.. All rights reserved.