public class BigVectorClassifier extends Object implements ScoredClassifier<Vector>, Serializable
BigVectorClassifier provides an efficient linear
classifier implementation for large numbers of categories.
Inputs are vector implementations and outputs are scored
classifications pruned to the top N.
This class reverses what's typically a category (row) dominant approach to a feature (column) dominant representation, allowing scaling to large number of categories when the columns are sparse.
The standard approach in linear classifiers is to multiply a (possibly sparse) input vector by each category's vector representation. The vector representing a category maps features to values, and may be sparse.
This class reverses the representation. Rather than a map from categories to features to values, it uses a map from features to categories to values. For a sparse input, it then iterates over the categories for each feature and adds the results. If the maps from categories to values for features are very sparse, this saves significant time over multiplying the input by each category's vector representation.
This class uses a custom heap to efficiently merge the features for each category, and a bounded priority queue for collecting n-best results.
There are no training methods provided as part of this class. It is meant as a general utility for importing large category linear classifiers.
| Constructor and Description |
|---|
BigVectorClassifier(Vector[] termVectors,
int maxResults)
Construct a big vector classifier with the specified term
vectors, maximum number of results, and categories equal to the
string representations of the category identifiers.
|
BigVectorClassifier(Vector[] termVectors,
String[] categories,
int maxResults)
Construct a big vector classifier with the specified term
vectors, categories, and maximum number of results.
|
| Modifier and Type | Method and Description |
|---|---|
ScoredClassification |
classify(Vector x)
Return a scored classification consisting of the top results
for the specified vector input.
|
int |
maxResults()
Return the maximum number of top results returned
by this classifier.
|
void |
setMaxResults(int maxResults)
Sets the maximum number of results returned by this
classifier.
|
public BigVectorClassifier(Vector[] termVectors, int maxResults)
See BigVectorClassifier(Vector[],String[],int) for
more information.
termVectors - Term vectors for classifier.maxResults - Maximum number of top results returned.public BigVectorClassifier(Vector[] termVectors, String[] categories, int maxResults)
termVectors - Term vectors for classifier.categories - Category names indexed by number.maxResults - Maximum number of top results returned.public int maxResults()
public void setMaxResults(int maxResults)
This method is a write method which should be read-write
synchronized with calls to classify(Vector).
maxResults - Maximum number of top results returned
by this classifier.public ScoredClassification classify(Vector x)
The maximum size of the returned scored classification is
given by maxResults() and set with setMaxResults(int).
classify in interface BaseClassifier<Vector>classify in interface RankedClassifier<Vector>classify in interface ScoredClassifier<Vector>x - Vector to classify.Copyright © 2019 Alias-i, Inc.. All rights reserved.