E - The type of object whose features are extracted.public class ZScoreFeatureExtractor<E> extends FeatureExtractorFilter<E> implements Serializable
ZScoreFeatureExtractor converts features to their
z-scores, where means and deviations are determined by a
corpus supplied at compile time.
Means and standard deviations are computed for each feature in the training section of the corpus supplied to the constructor.
At run time, feature values are converted to z-scores, by:
wherez(feat,val) = (val - mean(feat))/stdDev(feat)
feat is the feature, val is the value
to be converted to a z-score, mean(feat) is the mean
(average) of the feature in the training corpus, and
stdDev(feat) is the standard deviation of the feature
in the training course.
Z-score normalization ensures that the collection of each feature's values has zero mean and unit standard deviation over the training section of the training corpus. This does not guarantee zero means and unit standard deviation over the test section of the corpus.
If a feature is unseen or has zero standard deviation in the training corpus, it is removed from all output. A feature only has zero standard deviation if it has the same value every time it occurs. For instance, all features seen only once will have zero variance. Effectively, features which always have the same value in the training set will be eliminated from future consideration.
A length-norm feature extractor is serializable if its base feature extractor is serializable.
| Constructor and Description |
|---|
ZScoreFeatureExtractor(Corpus<ObjectHandler<Classified<E>>> corpus,
FeatureExtractor<? super E> extractor)
Construct a z-core feature extractor from the specified base
feature extractor and the training section of the supplied
corpus.
|
| Modifier and Type | Method and Description |
|---|---|
Map<String,? extends Number> |
features(E in)
Return the feature map resulting from converting the feature
map produced by the underlying feature extractor to z-scores.
|
Set<String> |
knownFeatures()
Returns an unmodifiable view of the known features
for this z-score feature extractor.
|
double |
mean(String feature)
Returns the mean for the specified feature, or
Double.NaN if the feature is not known. |
double |
standardDeviation(String feature)
Returns the standard deviation for the specified feature, or
Double.NaN if the feature is not known. |
String |
toString()
Returns a string representation of this z-score feature
extractor, listing the mean and deviation for each
feature.
|
double |
zScore(String feature,
double value)
Return the z-score for the specified feature and value.
|
baseExtractorpublic ZScoreFeatureExtractor(Corpus<ObjectHandler<Classified<E>>> corpus, FeatureExtractor<? super E> extractor) throws IOException
extractor - Base feature extractor.corpus - The corpus whose training section will be visitedIOException - If there is an I/O error visting the corpus.public Map<String,? extends Number> features(E in)
features in interface FeatureExtractor<E>features in class FeatureExtractorFilter<E>in - Input object.public double zScore(String feature, double value)
feature - Feature name.value - Value of feature.public double mean(String feature)
Double.NaN if the feature is not known.feature - Feature whose mean is returned.public double standardDeviation(String feature)
Double.NaN if the feature is not known.feature - Feature whose standard deviation is returned.public Set<String> knownFeatures()
Copyright © 2016 Alias-i, Inc.. All rights reserved.