Class MathUtils


  • public final class MathUtils
    extends java.lang.Object
    Math utils that features normalizations and other fancy stuff.
    Author:
    thomas.jungblut
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static double EPS  
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static double computeAUC​(java.util.List<MathUtils.PredictionOutcomePair> outcomePredictedPairs)
      This is actually taken from Kaggle's C# implementation: ://www.kaggle.com/c/SemiSupervisedFeatureLearning /forums/t/919/auc-implementation/6136#post6136.
      static de.jungblut.math.dense.DenseDoubleMatrix createPolynomials​(de.jungblut.math.dense.DenseDoubleMatrix seed, int num)
      Creates a new matrix consisting out of polynomials of the input matrix.
      Considering you want to do a 2 polynomial out of 3 columns you get:
      (SEED: x^1 | y^1 | z^1 )| x^2 | y^2 | z^2 for the columns of the returned matrix.
      static double guardedLogarithm​(double input)  
      static de.jungblut.math.DoubleMatrix logMatrix​(de.jungblut.math.DoubleMatrix input)  
      static de.jungblut.math.DoubleVector logVector​(de.jungblut.math.DoubleVector input)  
      static de.jungblut.math.tuple.Tuple3<de.jungblut.math.DoubleMatrix,​de.jungblut.math.DoubleVector,​de.jungblut.math.DoubleVector> meanNormalizeColumns​(de.jungblut.math.DoubleMatrix x)  
      static de.jungblut.math.tuple.Tuple<de.jungblut.math.DoubleVector,​de.jungblut.math.DoubleVector> meanNormalizeColumns​(Dataset dataset)
      Normalizes the given dataset (inplace), by subtracting the mean and dividing by the stddev.
      static de.jungblut.math.tuple.Tuple<de.jungblut.math.DoubleVector,​de.jungblut.math.DoubleVector> meanNormalizeColumns​(Dataset dataset, java.util.function.Predicate<FeatureOutcomePair> filterPredicate)
      Normalizes the given dataset (inplace), by subtracting the mean and dividing by the stddev.
      static de.jungblut.math.tuple.Tuple<de.jungblut.math.DoubleMatrix,​de.jungblut.math.DoubleVector> meanNormalizeRows​(de.jungblut.math.DoubleMatrix pMatrix)  
      static double minMaxScale​(double x, double fromMin, double fromMax, double toMin, double toMax)
      Scales a single input into the interval given by min and max.
      static de.jungblut.math.DoubleMatrix minMaxScale​(de.jungblut.math.DoubleMatrix input, double fromMin, double fromMax, double toMin, double toMax)
      Scales a matrix into the interval given by min and max.
      static de.jungblut.math.DoubleVector minMaxScale​(de.jungblut.math.DoubleVector input, double fromMin, double fromMax, double toMin, double toMax)
      Scales a vector into the interval given by min and max.
      static de.jungblut.math.DoubleVector numericalGradient​(de.jungblut.math.DoubleVector vector, CostFunction f)
      Calculates the numerical gradient from a cost function using the central difference theorem.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • EPS

        public static final double EPS
    • Method Detail

      • meanNormalizeRows

        public static de.jungblut.math.tuple.Tuple<de.jungblut.math.DoubleMatrix,​de.jungblut.math.DoubleVector> meanNormalizeRows​(de.jungblut.math.DoubleMatrix pMatrix)
        Returns:
        mean normalized matrix (0 mean and stddev of 1) as well as the mean.
      • meanNormalizeColumns

        public static de.jungblut.math.tuple.Tuple3<de.jungblut.math.DoubleMatrix,​de.jungblut.math.DoubleVector,​de.jungblut.math.DoubleVector> meanNormalizeColumns​(de.jungblut.math.DoubleMatrix x)
        Returns:
        the normalized matrix (0 mean and stddev of 1) as well as the mean and the stddev.
      • meanNormalizeColumns

        public static de.jungblut.math.tuple.Tuple<de.jungblut.math.DoubleVector,​de.jungblut.math.DoubleVector> meanNormalizeColumns​(Dataset dataset)
        Normalizes the given dataset (inplace), by subtracting the mean and dividing by the stddev. Dataset will have 0 mean and stddev of 1.
        Returns:
        a tuple of the mean and the stddev.
      • meanNormalizeColumns

        public static de.jungblut.math.tuple.Tuple<de.jungblut.math.DoubleVector,​de.jungblut.math.DoubleVector> meanNormalizeColumns​(Dataset dataset,
                                                                                                                                           java.util.function.Predicate<FeatureOutcomePair> filterPredicate)
        Normalizes the given dataset (inplace), by subtracting the mean and dividing by the stddev. Dataset will have 0 mean and stddev of 1.

        Additionally you can supply a predicate, if you want to only execute this on a specific sub part of the dataset.

        Returns:
        a tuple of the mean and the stddev.
      • createPolynomials

        public static de.jungblut.math.dense.DenseDoubleMatrix createPolynomials​(de.jungblut.math.dense.DenseDoubleMatrix seed,
                                                                                 int num)
        Creates a new matrix consisting out of polynomials of the input matrix.
        Considering you want to do a 2 polynomial out of 3 columns you get:
        (SEED: x^1 | y^1 | z^1 )| x^2 | y^2 | z^2 for the columns of the returned matrix.
        Parameters:
        seed - matrix to add polynoms of it.
        num - how many polynoms, 2 for quadratic, 3 for cubic and so forth.
        Returns:
        the new matrix.
      • numericalGradient

        public static de.jungblut.math.DoubleVector numericalGradient​(de.jungblut.math.DoubleVector vector,
                                                                      CostFunction f)
        Calculates the numerical gradient from a cost function using the central difference theorem. f'(x) = (f(x + h) - f(x - h)) / 2.
        Parameters:
        vector - the parameters to derive.
        f - the costfunction to return the cost at a given parameterset.
        Returns:
        a numerical gradient.
      • logMatrix

        public static de.jungblut.math.DoubleMatrix logMatrix​(de.jungblut.math.DoubleMatrix input)
        Returns:
        a log'd matrix that was guarded against edge cases of the logarithm.
      • logVector

        public static de.jungblut.math.DoubleVector logVector​(de.jungblut.math.DoubleVector input)
        Returns:
        a log'd matrix that was guarded against edge cases of the logarithm.
      • minMaxScale

        public static de.jungblut.math.DoubleMatrix minMaxScale​(de.jungblut.math.DoubleMatrix input,
                                                                double fromMin,
                                                                double fromMax,
                                                                double toMin,
                                                                double toMax)
        Scales a matrix into the interval given by min and max.
        Parameters:
        input - the input value.
        fromMin - the lower bound of the input interval.
        fromMax - the upper bound of the input interval.
        toMin - the lower bound of the target interval.
        toMax - the upper bound of the target interval.
        Returns:
        the new matrix with scaled values.
      • minMaxScale

        public static de.jungblut.math.DoubleVector minMaxScale​(de.jungblut.math.DoubleVector input,
                                                                double fromMin,
                                                                double fromMax,
                                                                double toMin,
                                                                double toMax)
        Scales a vector into the interval given by min and max.
        Parameters:
        input - the input vector.
        fromMin - the lower bound of the input interval.
        fromMax - the upper bound of the input interval.
        toMin - the lower bound of the target interval.
        toMax - the upper bound of the target interval.
        Returns:
        the new vector with scaled values.
      • minMaxScale

        public static double minMaxScale​(double x,
                                         double fromMin,
                                         double fromMax,
                                         double toMin,
                                         double toMax)
        Scales a single input into the interval given by min and max.
        Parameters:
        x - the input value.
        fromMin - the lower bound of the input interval.
        fromMax - the upper bound of the input interval.
        toMin - the lower bound of the target interval.
        toMax - the upper bound of the target interval.
        Returns:
        the bounded value.
      • guardedLogarithm

        public static double guardedLogarithm​(double input)
        Returns:
        a log'd value of the input that is guarded.
      • computeAUC

        public static double computeAUC​(java.util.List<MathUtils.PredictionOutcomePair> outcomePredictedPairs)
        This is actually taken from Kaggle's C# implementation: ://www.kaggle.com/c/SemiSupervisedFeatureLearning /forums/t/919/auc-implementation/6136#post6136.
        Parameters:
        outcomePredictedPairs - the list of PredictionOutcomePair: class (0 or 1) -> predicted value
        Returns:
        the AUC value.