K - type of the keys mapping the elementsV - type of the values being combined per keypublic abstract static class ApproximateDistinct.PerKeyDistinct<K,V>
extends org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PCollection<org.apache.beam.sdk.values.KV<K,V>>,org.apache.beam.sdk.values.PCollection<org.apache.beam.sdk.values.KV<K,java.lang.Long>>>
ApproximateDistinct.perKey().| Constructor and Description |
|---|
PerKeyDistinct() |
| Modifier and Type | Method and Description |
|---|---|
org.apache.beam.sdk.values.PCollection<org.apache.beam.sdk.values.KV<K,java.lang.Long>> |
expand(org.apache.beam.sdk.values.PCollection<org.apache.beam.sdk.values.KV<K,V>> input) |
ApproximateDistinct.PerKeyDistinct<K,V> |
withPrecision(int p)
Sets the precision
p. |
ApproximateDistinct.PerKeyDistinct<K,V> |
withSparsePrecision(int sp)
Sets the sparse representation's precision
sp. |
addAnnotation, compose, compose, getAdditionalInputs, getAnnotations, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, getResourceHints, populateDisplayData, setDisplayData, setResourceHints, toString, validate, validatepublic ApproximateDistinct.PerKeyDistinct<K,V> withPrecision(int p)
p.
Keep in mind that p cannot be lower than 4, because the estimation would be too
inaccurate.
See ApproximateDistinct.precisionForRelativeError(double) and ApproximateDistinct.relativeErrorForPrecision(int) to have more information about the
relationship between precision and relative error.
p - the precision value for the normal representationpublic ApproximateDistinct.PerKeyDistinct<K,V> withSparsePrecision(int sp)
sp.
Values above 32 are not yet supported by the AddThis version of HyperLogLog+.
Fore more information about the sparse representation, read Google's paper available here.
sp - the precision of HyperLogLog+' sparse representation