@Experimental(value=SOURCE_SINK) public class Write extends Object
PTransform that writes to a Sink. A write begins with a sequential global
initialization of a sink, followed by a parallel write, and ends with a sequential finalization
of the write. The output of a write is PDone.
By default, every bundle in the input PCollection will be processed by a
Sink.WriteOperation, so the number of outputs will vary based on runner behavior, though at
least 1 output will always be produced. The exact parallelism of the write stage can be
controlled using Write.Bound.withNumShards(int), typically used to control how many files are
produced or to globally limit the number of workers connecting to an external service. However,
this option can often hurt performance: it adds an additional GroupByKey to the pipeline.
Write re-windows the data into the global window, so it is typically not well suited
to use in streaming pipelines.
Example usage with runner-controlled sharding:
p.apply(Write.to(new MySink(...)));
Example usage with a fixed number of shards:
p.apply(Write.to(new MySink(...)).withNumShards(3));| Modifier and Type | Class and Description |
|---|---|
static class |
Write.Bound<T>
A
PTransform that writes to a Sink. |
| Constructor and Description |
|---|
Write() |
public static <T> Write.Bound<T> to(Sink<T> sink)