Class SparseFeatureExtractorHelper<K>


  • public final class SparseFeatureExtractorHelper<K>
    extends java.lang.Object
    Convenient helper for creating vectors out of text features for sequence learning. Inspired by Coursera's NLP Class PA4.
    Author:
    thomas.jungblut
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      java.lang.String[] getDictionary()  
      de.jungblut.math.tuple.Tuple<de.jungblut.math.DoubleVector[],​de.jungblut.math.DoubleVector[]> vectorize()
      Vectorizes the given data from the constructor.
      de.jungblut.math.tuple.Tuple<de.jungblut.math.DoubleVector[],​de.jungblut.math.DoubleVector[]> vectorize​(java.util.List<K> words, java.util.List<java.lang.Integer> labels)
      Vectorizes the given data.
      de.jungblut.math.DoubleVector vectorize​(K word)
      Vectorizes the given word.
      de.jungblut.math.DoubleVector vectorize​(K word, java.lang.Integer lastLabel)
      Vectorizes the given word with the previous outcome.
      de.jungblut.math.tuple.Tuple<de.jungblut.math.DoubleVector[],​de.jungblut.math.DoubleVector[]> vectorizeAdditionals​(java.util.List<K> words, java.util.List<java.lang.Integer> labels)
      Vectorizes the given data.
      de.jungblut.math.DoubleVector[] vectorizeEachLabel​(java.util.List<K> words)
      Vectorizes the given data for each label.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • SparseFeatureExtractorHelper

        public SparseFeatureExtractorHelper​(java.util.List<K> words,
                                            java.util.List<java.lang.Integer> labels,
                                            SequenceFeatureExtractor<K> extractor)
        Constructs this feature factory.
        Parameters:
        words - a list of words in sequence to learn on.
        labels - the corresponding labels in parallel to the words.
        extractor - the core implementation of the feature extractor.
      • SparseFeatureExtractorHelper

        public SparseFeatureExtractorHelper​(java.util.List<K> words,
                                            java.util.List<java.lang.Integer> labels,
                                            SequenceFeatureExtractor<K> extractor,
                                            java.lang.String[] dictionary)
        Constructs this feature factory via a given dictionary.
        Parameters:
        words - a list of words in sequence to learn on.
        labels - the corresponding labels in parallel to the words.
        extractor - the core implementation of the feature extractor.
        dictionary - an already given dictionary.
    • Method Detail

      • vectorize

        public de.jungblut.math.tuple.Tuple<de.jungblut.math.DoubleVector[],​de.jungblut.math.DoubleVector[]> vectorize()
        Vectorizes the given data from the constructor. Internally builds a dictionary that can be saved to vectorize additional data with vectorizeAdditionals(List, List).
        Returns:
        a Tuple with the features in the first dimension, and on the second the outcome.
      • vectorize

        public de.jungblut.math.DoubleVector vectorize​(K word)
        Vectorizes the given word.
        Returns:
        the feature for the given word.
      • vectorize

        public de.jungblut.math.DoubleVector vectorize​(K word,
                                                       java.lang.Integer lastLabel)
        Vectorizes the given word with the previous outcome.
        Returns:
        the feature for the given word.
      • vectorize

        public de.jungblut.math.tuple.Tuple<de.jungblut.math.DoubleVector[],​de.jungblut.math.DoubleVector[]> vectorize​(java.util.List<K> words,
                                                                                                                             java.util.List<java.lang.Integer> labels)
        Vectorizes the given data. Internally uses a dictionary that was created by vectorize() or creates one on this data.
        Returns:
        a Tuple with the features in the first dimension, and on the second the outcome.
      • vectorizeAdditionals

        public de.jungblut.math.tuple.Tuple<de.jungblut.math.DoubleVector[],​de.jungblut.math.DoubleVector[]> vectorizeAdditionals​(java.util.List<K> words,
                                                                                                                                        java.util.List<java.lang.Integer> labels)
        Vectorizes the given data. Internally uses a dictionary that was created by vectorize() or creates one on this data.
        Returns:
        a Tuple with the features in the first dimension, and on the second the outcome.
      • vectorizeEachLabel

        public de.jungblut.math.DoubleVector[] vectorizeEachLabel​(java.util.List<K> words)
        Vectorizes the given data for each label. Internally uses a dictionary that was created by vectorize() or creates one on this data.
      • getDictionary

        public java.lang.String[] getDictionary()
        Returns:
        the built dictionary.