public class DocFrequencySelector extends AbstractFeatureSelector implements Serializable
Unsupervised Feature Selector which exclude features with its document frequency less than a given threshold
Please refer the paper below for details of the algorithm.
Yang, Y. and Pedersen, J.O., "A comparative study on feature selection in text categorization,"
In Proceedings of International Conference on Machine Learning, 1997, pp. 412-420.
Copyright: Copyright (c) 2005
Company: IST, Drexel University
featureMap, selectedFeatureNum| Constructor and Description |
|---|
DocFrequencySelector(int minDocFrequency) |
| Modifier and Type | Method and Description |
|---|---|
protected int[] |
getSelectedFeatures(IndexReader indexReader,
DocClassSet trainingSet) |
protected int[] |
getSelectedFeatures(SparseMatrix doctermMatrix,
DocClassSet trainingSet) |
getClassPrior, getSelectedFeatureNum, getTermDistribution, getTermDistribution, getTermDocFrequency, isSelected, map, setSelectedFeatures, train, trainprotected int[] getSelectedFeatures(SparseMatrix doctermMatrix, DocClassSet trainingSet)
getSelectedFeatures in class AbstractFeatureSelectorprotected int[] getSelectedFeatures(IndexReader indexReader, DocClassSet trainingSet)
getSelectedFeatures in class AbstractFeatureSelectorCopyright © 2018 JULIE Lab, Germany. All rights reserved.