InputT - the type of the (main) input elementsOutputT - the type of the (main) output elements@Experimental public abstract class DoFnWithContext<InputT,OutputT> extends Object implements Serializable, HasDisplayData
ParDo providing the code to use to process
elements of the input
PCollection.
See ParDo for more explanation, examples of use, and
discussion of constraints on DoFnWithContexts, including their
serializability, lack of access to global shared mutable state,
requirements for failure tolerance, and benefits of optimization.
DoFnWithContexts can be tested in a particular
Pipeline by running that Pipeline on sample input
and then checking its output. Unit testing of a DoFnWithContext,
separately from any ParDo transform or Pipeline,
can be done via the DoFnTester harness.
Implementations must define a method annotated with DoFnWithContext.ProcessElement
that satisfies the requirements described there. See the DoFnWithContext.ProcessElement
for details.
This functionality is experimental and likely to change.
Example usage:
{@code
PCollection lines = ... ;
PCollection words =
lines.apply(ParDo.of(new DoFnWithContext() { | Modifier and Type | Class and Description |
|---|---|
class |
DoFnWithContext.Context
Information accessible to all methods in this
DoFnWithContext. |
static interface |
DoFnWithContext.ExtraContextFactory<InputT,OutputT>
Interface for runner implementors to provide implementations of extra context information.
|
static interface |
DoFnWithContext.FinishBundle
Annotation for the method to use to prepare an instance for processing a batch of elements.
|
class |
DoFnWithContext.ProcessContext
Information accessible when running
DoFn.processElement(org.apache.beam.sdk.transforms.DoFn<InputT, OutputT>.ProcessContext). |
static interface |
DoFnWithContext.ProcessElement
Annotation for the method to use for processing elements.
|
static interface |
DoFnWithContext.StartBundle
Annotation for the method to use to prepare an instance for processing a batch of elements.
|
| Constructor and Description |
|---|
DoFnWithContext() |
| Modifier and Type | Method and Description |
|---|---|
<AggInputT,AggOutputT> |
createAggregator(String name,
Combine.CombineFn<? super AggInputT,?,AggOutputT> combiner)
Returns an
Aggregator with aggregation logic specified by the
Combine.CombineFn argument. |
<AggInputT> |
createAggregator(String name,
SerializableFunction<Iterable<AggInputT>,AggInputT> combiner)
Returns an
Aggregator with the aggregation logic specified by the
SerializableFunction argument. |
Duration |
getAllowedTimestampSkew()
Returns the allowed timestamp skew duration, which is the maximum
duration that timestamps can be shifted backward in
DoFnWithContext.Context.outputWithTimestamp(OutputT, org.joda.time.Instant). |
protected TypeDescriptor<InputT> |
getInputTypeDescriptor()
Returns a
TypeDescriptor capturing what is known statically
about the input type of this DoFnWithContext instance's most-derived
class. |
protected TypeDescriptor<OutputT> |
getOutputTypeDescriptor()
Returns a
TypeDescriptor capturing what is known statically
about the output type of this DoFnWithContext instance's
most-derived class. |
void |
populateDisplayData(DisplayData.Builder builder)
Register display data for the given transform or component.
|
public Duration getAllowedTimestampSkew()
DoFnWithContext.Context.outputWithTimestamp(OutputT, org.joda.time.Instant).
The default value is Duration.ZERO, in which case
timestamps can only be shifted forward to future. For infinite
skew, return Duration.millis(Long.MAX_VALUE).
protected TypeDescriptor<InputT> getInputTypeDescriptor()
TypeDescriptor capturing what is known statically
about the input type of this DoFnWithContext instance's most-derived
class.
See getOutputTypeDescriptor() for more discussion.
protected TypeDescriptor<OutputT> getOutputTypeDescriptor()
TypeDescriptor capturing what is known statically
about the output type of this DoFnWithContext instance's
most-derived class.
In the normal case of a concrete DoFnWithContext subclass with
no generic type parameters of its own (including anonymous inner
classes), this will be a complete non-generic type, which is good
for choosing a default output Coder<O> for the output
PCollection<O>.
public final <AggInputT,AggOutputT> Aggregator<AggInputT,AggOutputT> createAggregator(String name, Combine.CombineFn<? super AggInputT,?,AggOutputT> combiner)
Aggregator with aggregation logic specified by the
Combine.CombineFn argument. The name provided must be unique across
Aggregators created within the DoFn. Aggregators can only be created
during pipeline construction.name - the name of the aggregatorcombiner - the Combine.CombineFn to use in the aggregatorNullPointerException - if the name or combiner is nullIllegalArgumentException - if the given name collides with another
aggregator in this scopeIllegalStateException - if called during pipeline execution.public final <AggInputT> Aggregator<AggInputT,AggInputT> createAggregator(String name, SerializableFunction<Iterable<AggInputT>,AggInputT> combiner)
Aggregator with the aggregation logic specified by the
SerializableFunction argument. The name provided must be unique
across Aggregators created within the DoFn. Aggregators can only be
created during pipeline construction.name - the name of the aggregatorcombiner - the SerializableFunction to use in the aggregatorNullPointerException - if the name or combiner is nullIllegalArgumentException - if the given name collides with another
aggregator in this scopeIllegalStateException - if called during pipeline execution.public void populateDisplayData(DisplayData.Builder builder)
populateDisplayData(DisplayData.Builder) is invoked by Pipeline runners to collect
display data via DisplayData.from(HasDisplayData). Implementations may call
super.populateDisplayData(builder) in order to register display data in the current
namespace, but should otherwise use subcomponent.populateDisplayData(builder) to use
the namespace of the subcomponent.
By default, does not register any display data. Implementors may override this method to provide their own display data.
populateDisplayData in interface HasDisplayDatabuilder - The builder to populate with display data.HasDisplayData