Class TDigestQuantiles
- java.lang.Object
-
- org.apache.beam.sdk.extensions.sketching.TDigestQuantiles
-
public final class TDigestQuantiles extends java.lang.ObjectPTransforms for getting information about quantiles in a stream.This class uses the T-Digest structure introduced by Ted Dunning, and more precisely the
MergingDigestimplementation.References
The paper and implementation are available on Ted Dunning's Github profile
Parameters
Only one parameter can be tuned in order to control the tradeoff between the estimation accuracy and the memory use.
Stream elements are compressed into a linked list of centroids. The compression factor
cfis used to limit the number of elements represented by each centroid as well as the total number of centroids.
The relative error will always be a small fraction of 1% for values at extreme quantiles and always be less than 3/cf at middle quantiles.
By default the compression factor is set to 100, which guarantees a relative error less than 3%.
Examples
There are 2 ways of using this class:
- Use the
PTransforms that return aPCollectionwhich contains aMergingDigestfor querying the value at a given quantile or the approximate quantile position of an element. - Use the
TDigestQuantiles.TDigestQuantilesFnCombineFnthat is exposed in order to make advanced processing involving theMergingDigest.
Example 1: Default use
The simplest use is to call the
globally()orperKey()method in order to retrieve the digest, and then to query the structure.PCollection<Double> pc = ...; PCollection<MergingDigest> countMinSketch = pc.apply(TDigestQuantiles .globally()); // .perKey()Example 2: tune accuracy parameters
One can tune the compression factor
cfin order to control accuracy and memory.
This tuning works exactly the same forglobally()andperKey().double cf = 500; PCollection<Double> pc = ...; PCollection<MergingDigest> countMinSketch = pc.apply(TDigestQuantiles .globally() // .perKey() .withCompression(cf);Example 3 : Query the resulting structure
This example shows how to query the resulting structure, for example to build
PCollectionofKVs with each pair corresponding to a couple (quantile, value).PCollection<MergingDigest> pc = ...; PCollection<KV<Double, Double>> quantiles = pc.apply(ParDo.of( new DoFn<MergingDigest, KV<Double, Double>>() { @ProcessElement public void processElement(ProcessContext c) { double[] quantiles = {0.01, 0.25, 0.5, 0.75, 0.99} for (double q : quantiles) { c.output(KV.of(q, c.element().quantile(q)); } }}));One can also retrieve the approximate quantile position of a given element in the stream using
cdf(double)method instead ofquantile(double).Example 4: Using the CombineFn
The
CombineFndoes the same thing as thePTransforms but it can be used for doing stateful processing or inCombineFns.ComposedCombineFn.This example is not really interesting but it shows how one can properly create a
TDigestQuantiles.TDigestQuantilesFn.double cf = 250; PCollection<Double> input = ...; PCollection<MergingDigest> output = input.apply(Combine .globally(TDigestQuantilesFn.create(cf))); - Use the
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classTDigestQuantiles.GlobalDigestImplementation ofglobally().static classTDigestQuantiles.PerKeyDigest<K>Implementation ofperKey().static classTDigestQuantiles.TDigestQuantilesFnImplements theCombine.CombineFnofTDigestQuantilestransforms.
-
Constructor Summary
Constructors Constructor Description TDigestQuantiles()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static TDigestQuantiles.GlobalDigestglobally()Compute the stream in order to build a T-Digest structure (MergingDigest) for keeping track of the stream distribution and returns aPCollection<MergingDigest>.static <K> TDigestQuantiles.PerKeyDigest<K>perKey()Likeglobally(), but builds a digest for each key in the stream.
-
-
-
Method Detail
-
globally
public static TDigestQuantiles.GlobalDigest globally()
Compute the stream in order to build a T-Digest structure (MergingDigest) for keeping track of the stream distribution and returns aPCollection<MergingDigest>.
The resulting structure can be queried in order to retrieve the approximate value at a given quantile or the approximate quantile position of a given element.
-
perKey
public static <K> TDigestQuantiles.PerKeyDigest<K> perKey()
Likeglobally(), but builds a digest for each key in the stream.- Type Parameters:
K- the type of the keys
-
-