K - type of input and output keysInputT - type of input valuesOutputT - type of output valuespublic static class Combine.GroupedValues<K,InputT,OutputT> extends PTransform<PCollection<? extends KV<K,? extends Iterable<InputT>>>,PCollection<KV<K,OutputT>>>
GroupedValues<K, InputT, OutputT> takes a
PCollection<KV<K, Iterable<InputT>>>, such as the result of
GroupByKey, applies a specified
KeyedCombineFn<K, InputT, AccumT, OutputT>
to each of the input KV<K, Iterable<InputT>> elements to
produce a combined output KV<K, OutputT> element, and returns a
PCollection<KV<K, OutputT>> containing all the combined output
elements. It is common for InputT == OutputT, but not required.
Common combining functions include sums, mins, maxes, and averages
of numbers, conjunctions and disjunctions of booleans, statistical
aggregations, etc.
Example of use:
PCollection<KV<String, Integer>> pc = ...;
PCollection<KV<String, Iterable<Integer>>> groupedByKey = pc.apply(
new GroupByKey<String, Integer>());
PCollection<KV<String, Integer>> sumByKey = groupedByKey.apply(
Combine.<String, Integer>groupedValues(
new Sum.SumIntegerFn()));
See also Combine.perKey(org.apache.beam.sdk.transforms.SerializableFunction<java.lang.Iterable<V>, V>)/Combine.PerKey, which
captures the common pattern of "combining by key" in a
single easy-to-use PTransform.
Combining for different keys can happen in parallel. Moreover,
combining of the Iterable<InputT> values associated a single
key can happen in parallel, with different subsets of the values
being combined separately, and their intermediate results combined
further, in an arbitrary tree reduction pattern, until a single
result value is produced for each key.
By default, the Coder of the keys of the output
PCollection<KV<K, OutputT>> is that of the keys of the input
PCollection<KV<K, InputT>>, and the Coder of the values
of the output PCollection<KV<K, OutputT>> is inferred from the
concrete type of the KeyedCombineFn<K, InputT, AccumT, OutputT>'s output
type OutputT.
Each output element has the same timestamp and is in the same window
as its corresponding input element, and the output
PCollection has the same
WindowFn
associated with it as the input.
See also Combine.globally(org.apache.beam.sdk.transforms.SerializableFunction<java.lang.Iterable<V>, V>)/Combine.Globally, which
combines all the values in a PCollection into a
single value in a PCollection.
name| Modifier and Type | Method and Description |
|---|---|
PCollection<KV<K,OutputT>> |
apply(PCollection<? extends KV<K,? extends Iterable<InputT>>> input)
Applies this
PTransform on the given InputT, and returns its
Output. |
org.apache.beam.sdk.util.AppliedCombineFn<? super K,? super InputT,?,OutputT> |
getAppliedFn(CoderRegistry registry,
Coder<? extends KV<K,? extends Iterable<InputT>>> inputCoder,
org.apache.beam.sdk.util.WindowingStrategy<?,?> windowingStrategy)
Returns the
Combine.CombineFn bound to its coders. |
Coder<KV<K,OutputT>> |
getDefaultOutputCoder(PCollection<? extends KV<K,? extends Iterable<InputT>>> input)
Returns the default
Coder to use for the output of this
single-output PTransform when applied to the given input. |
CombineFnBase.PerKeyCombineFn<? super K,? super InputT,?,OutputT> |
getFn()
Returns the KeyedCombineFn used by this Combine operation.
|
List<PCollectionView<?>> |
getSideInputs() |
void |
populateDisplayData(DisplayData.Builder builder)
Register display data for the given transform or component.
|
Combine.GroupedValues<K,InputT,OutputT> |
withSideInputs(Iterable<? extends PCollectionView<?>> sideInputs) |
getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, toString, validatepublic Combine.GroupedValues<K,InputT,OutputT> withSideInputs(Iterable<? extends PCollectionView<?>> sideInputs)
public CombineFnBase.PerKeyCombineFn<? super K,? super InputT,?,OutputT> getFn()
public List<PCollectionView<?>> getSideInputs()
public PCollection<KV<K,OutputT>> apply(PCollection<? extends KV<K,? extends Iterable<InputT>>> input)
PTransformPTransform on the given InputT, and returns its
Output.
Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
apply in class PTransform<PCollection<? extends KV<K,? extends Iterable<InputT>>>,PCollection<KV<K,OutputT>>>public org.apache.beam.sdk.util.AppliedCombineFn<? super K,? super InputT,?,OutputT> getAppliedFn(CoderRegistry registry, Coder<? extends KV<K,? extends Iterable<InputT>>> inputCoder, org.apache.beam.sdk.util.WindowingStrategy<?,?> windowingStrategy)
Combine.CombineFn bound to its coders.
For internal use.
public Coder<KV<K,OutputT>> getDefaultOutputCoder(PCollection<? extends KV<K,? extends Iterable<InputT>>> input) throws CannotProvideCoderException
PTransformCoder to use for the output of this
single-output PTransform when applied to the given input.getDefaultOutputCoder in class PTransform<PCollection<? extends KV<K,? extends Iterable<InputT>>>,PCollection<KV<K,OutputT>>>CannotProvideCoderException - if none can be inferred.
By default, always throws.
public void populateDisplayData(DisplayData.Builder builder)
PTransformpopulateDisplayData(DisplayData.Builder) is invoked by Pipeline runners to collect
display data via DisplayData.from(HasDisplayData). Implementations may call
super.populateDisplayData(builder) in order to register display data in the current
namespace, but should otherwise use subcomponent.populateDisplayData(builder) to use
the namespace of the subcomponent.
By default, does not register any display data. Implementors may override this method to provide their own display data.
populateDisplayData in interface HasDisplayDatapopulateDisplayData in class PTransform<PCollection<? extends KV<K,? extends Iterable<InputT>>>,PCollection<KV<K,OutputT>>>builder - The builder to populate with display data.HasDisplayData