Class QuantileEstimator

  • All Implemented Interfaces:
    Serializable, Cloneable, Iterable<QuantileEstimator.Bar>

    public class QuantileEstimator
    extends SummaryStat
    implements Iterable<QuantileEstimator.Bar>

    Implements the extended P2-Algorithm. To calculate histograms, median values or arbitrary quantiles. This class also collects all statistical values collected by SummaryStat.

    The method used is based on the following papers:

    • Raj Jain, Imrich Chlamtac: The P2 Algorithm for Dynamic Calculation of Quantiles and Histograms Without Storing Observations, ACM 28, 10 (1985)
    • Kimmo Raatikainen: Simultaneous estimation of several percentiles, Simulations Councils (1987)
    Author:
    Robin Kreis, 2012-09-07
    See Also:
    Serialized Form
    • Field Detail

      • p2_q

        protected double[] p2_q
        Stores the
      • p2_n

        protected int[] p2_n
      • p2_n_increment

        protected double[] p2_n_increment
    • Constructor Detail

      • QuantileEstimator

        public QuantileEstimator()
        Creates a QuantileEstimator and optimizes the marker positions to estimates the quantiles 0.1, 0.5 (the median) and 0.9 well.
      • QuantileEstimator

        public QuantileEstimator​(double... quantiles)
        Creates a QuantileEstimator and optimizes the marker positions to estimates the given quantiles well.
        Parameters:
        quantiles - a list of quantiles to be estimated
        See Also:
        setQuantileList(double...)
    • Method Detail

      • setQuantileList

        public void setQuantileList​(double... quantiles)
        Sets a list of quantiles to be estimated. For n quantiles, 2n+3 markers will be created.
      • getQuantileList

        public double[] getQuantileList()
        Returns a list of quantiles that are estimated well. The returned array will be equal to the one passed to #QuantileEstimator(String, double...) or setQuantileList(double...), if the markers haven't been modified afterwards.
      • setCellCount

        public void setCellCount​(int cells)
        Sets the number of cells of the histogram. Each cell will usually be plotted as a bar.
      • getCellCount

        public int getCellCount()
        Returns the number of cells of the histogram. The returned value will be equal to the one passed to setCellCount(int), if the markers have't been modified afterwards.
      • combine

        public SummaryStat combine​(SummaryStat other)
        Description copied from class: SummaryStat
        Combines the data in other with this SummaryStat-Object. The combined object behaves as if it had also seen the data of "other".
        Overrides:
        combine in class SummaryStat
        Parameters:
        other - The SummaryStat to combine with this object.
        Returns:
        Returns this to allow easy chaining of calls.
      • initMarkers

        protected void initMarkers()
        Initializes all markers. This requires p2_n_increment to be set. After this method completes, p2_n and p2_q will have the right dimensions and p2_n will be initialized. This method should only be called when SummaryStat.numObs() would return 0. Otherwise, clear() should be called.
      • value

        public QuantileEstimator value​(double v,
                                       double weight)
        Description copied from class: SummaryStat
        Adds a value with a given weight.
        Overrides:
        value in class SummaryStat
        Parameters:
        v - The value to add.
        weight - The weight to give to this value. Has to be positive.
        Returns:
        this, to allow easy chaining of calls.
      • quadPred

        protected double quadPred​(int d,
                                  int i)
      • linPred

        protected double linPred​(int d,
                                 int i)
      • getMarkers

        public String getMarkers()
        Returns all markers and their positions. Used for testing.
        Returns:
        the current markers formatted as a string
      • formatForGnuplot

        public void formatForGnuplot​(Formatter fmt)
        Formats a histogram so that a bar graph can be plotted. Each line of the output will represent one bar and have three columns for the middle X position, height and width of the bar. The area of each bar is the ratio of all values within the X range of the bar. Gnuplot can directly plot a bar graph using the command plot filename with boxes.
        Parameters:
        fmt - the formatter to store the output
      • quantile

        public double quantile​(double p)
        Estimates a quantile. If there is no marker for the quantile p, linear interpolation between the two closest markers is performed. If p is NaN, NaN will be returned. If there haven't been enough observations or the markers are not initialized, NaN is returned. If p <= 0.0 or p >= 1.0, the minimum or maximum will be returned.
        Parameters:
        p - any number
        Returns:
        a number that is estimated to be bigger than 100p percent of all numbers or Double.NaN, if no data is available