Package de.jungblut.ner
Class SparseFeatureExtractorHelper<K>
- java.lang.Object
-
- de.jungblut.ner.SparseFeatureExtractorHelper<K>
-
public final class SparseFeatureExtractorHelper<K> extends java.lang.ObjectConvenient helper for creating vectors out of text features for sequence learning. Inspired by Coursera's NLP Class PA4.- Author:
- thomas.jungblut
-
-
Constructor Summary
Constructors Constructor Description SparseFeatureExtractorHelper(java.util.List<K> words, java.util.List<java.lang.Integer> labels, SequenceFeatureExtractor<K> extractor)Constructs this feature factory.SparseFeatureExtractorHelper(java.util.List<K> words, java.util.List<java.lang.Integer> labels, SequenceFeatureExtractor<K> extractor, java.lang.String[] dictionary)Constructs this feature factory via a given dictionary.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description java.lang.String[]getDictionary()de.jungblut.math.tuple.Tuple<de.jungblut.math.DoubleVector[],de.jungblut.math.DoubleVector[]>vectorize()Vectorizes the given data from the constructor.de.jungblut.math.tuple.Tuple<de.jungblut.math.DoubleVector[],de.jungblut.math.DoubleVector[]>vectorize(java.util.List<K> words, java.util.List<java.lang.Integer> labels)Vectorizes the given data.de.jungblut.math.DoubleVectorvectorize(K word)Vectorizes the given word.de.jungblut.math.DoubleVectorvectorize(K word, java.lang.Integer lastLabel)Vectorizes the given word with the previous outcome.de.jungblut.math.tuple.Tuple<de.jungblut.math.DoubleVector[],de.jungblut.math.DoubleVector[]>vectorizeAdditionals(java.util.List<K> words, java.util.List<java.lang.Integer> labels)Vectorizes the given data.de.jungblut.math.DoubleVector[]vectorizeEachLabel(java.util.List<K> words)Vectorizes the given data for each label.
-
-
-
Constructor Detail
-
SparseFeatureExtractorHelper
public SparseFeatureExtractorHelper(java.util.List<K> words, java.util.List<java.lang.Integer> labels, SequenceFeatureExtractor<K> extractor)
Constructs this feature factory.- Parameters:
words- a list of words in sequence to learn on.labels- the corresponding labels in parallel to the words.extractor- the core implementation of the feature extractor.
-
SparseFeatureExtractorHelper
public SparseFeatureExtractorHelper(java.util.List<K> words, java.util.List<java.lang.Integer> labels, SequenceFeatureExtractor<K> extractor, java.lang.String[] dictionary)
Constructs this feature factory via a given dictionary.- Parameters:
words- a list of words in sequence to learn on.labels- the corresponding labels in parallel to the words.extractor- the core implementation of the feature extractor.dictionary- an already given dictionary.
-
-
Method Detail
-
vectorize
public de.jungblut.math.tuple.Tuple<de.jungblut.math.DoubleVector[],de.jungblut.math.DoubleVector[]> vectorize()
Vectorizes the given data from the constructor. Internally builds a dictionary that can be saved to vectorize additional data withvectorizeAdditionals(List, List).- Returns:
- a
Tuplewith the features in the first dimension, and on the second the outcome.
-
vectorize
public de.jungblut.math.DoubleVector vectorize(K word)
Vectorizes the given word.- Returns:
- the feature for the given word.
-
vectorize
public de.jungblut.math.DoubleVector vectorize(K word, java.lang.Integer lastLabel)
Vectorizes the given word with the previous outcome.- Returns:
- the feature for the given word.
-
vectorize
public de.jungblut.math.tuple.Tuple<de.jungblut.math.DoubleVector[],de.jungblut.math.DoubleVector[]> vectorize(java.util.List<K> words, java.util.List<java.lang.Integer> labels)
Vectorizes the given data. Internally uses a dictionary that was created byvectorize()or creates one on this data.- Returns:
- a
Tuplewith the features in the first dimension, and on the second the outcome.
-
vectorizeAdditionals
public de.jungblut.math.tuple.Tuple<de.jungblut.math.DoubleVector[],de.jungblut.math.DoubleVector[]> vectorizeAdditionals(java.util.List<K> words, java.util.List<java.lang.Integer> labels)
Vectorizes the given data. Internally uses a dictionary that was created byvectorize()or creates one on this data.- Returns:
- a
Tuplewith the features in the first dimension, and on the second the outcome.
-
vectorizeEachLabel
public de.jungblut.math.DoubleVector[] vectorizeEachLabel(java.util.List<K> words)
Vectorizes the given data for each label. Internally uses a dictionary that was created byvectorize()or creates one on this data.
-
getDictionary
public java.lang.String[] getDictionary()
- Returns:
- the built dictionary.
-
-