T - The type of objects to writeWriteT - The result of a per-bundle writepublic abstract static class Sink.WriteOperation<T,WriteT> extends Object implements Serializable
Sink.WriteOperation defines the process of a parallel write of objects to a Sink.
The WriteOperation defines how to perform initialization and finalization of a
parallel write to a sink as well as how to create a Sink.Writer object that can write
a bundle to the sink.
Since operations in Dataflow may be run multiple times for redundancy or fault-tolerance, the initialization and finalization defined by a WriteOperation must be idempotent.
WriteOperations may be mutable; a WriteOperation is serialized after the
call to initialize method and deserialized before calls to
createWriter and finalized. However, it is not
reserialized after createWriter, so createWriter should not mutate the
state of the WriteOperation.
See Sink for more detailed documentation about the process of writing to a Sink.
| Constructor and Description |
|---|
WriteOperation() |
| Modifier and Type | Method and Description |
|---|---|
abstract Sink.Writer<T,WriteT> |
createWriter(PipelineOptions options)
Creates a new
Sink.Writer to write a bundle of the input to the sink. |
abstract void |
finalize(Iterable<WriteT> writerResults,
PipelineOptions options)
Given an Iterable of results from bundle writes, performs finalization after writing and
closes the sink.
|
abstract Sink<T> |
getSink()
Returns the Sink that this write operation writes to.
|
Coder<WriteT> |
getWriterResultCoder()
Returns a coder for the writer result type.
|
abstract void |
initialize(PipelineOptions options)
Performs initialization before writing to the sink.
|
public abstract void initialize(PipelineOptions options) throws Exception
Exceptionpublic abstract void finalize(Iterable<WriteT> writerResults, PipelineOptions options) throws Exception
The results that are passed to finalize are those returned by bundles that completed successfully. Although bundles may have been run multiple times (for fault-tolerance), only one writer result will be passed to finalize for each bundle. An implementation of finalize should perform clean up of any failed and successfully retried bundles. Note that these failed bundles will not have their writer result passed to finalize, so finalize should be capable of locating any temporary/partial output written by failed bundles.
A best practice is to make finalize atomic. If this is impossible given the semantics of the sink, finalize should be idempotent, as it may be called multiple times in the case of failure/retry or for redundancy.
Note that the iteration order of the writer results is not guaranteed to be consistent if finalize is called multiple times.
writerResults - an Iterable of results from successful bundle writes.Exceptionpublic abstract Sink.Writer<T,WriteT> createWriter(PipelineOptions options) throws Exception
Sink.Writer to write a bundle of the input to the sink.
The bundle id that the writer will use to uniquely identify its output will be passed to
Sink.Writer.open(java.lang.String).
Must not mutate the state of the WriteOperation.
Exception