public static class AvroIO.Write extends Object
PTransform that writes a PCollection to an Avro file (or
multiple Avro files matching a sharding pattern).| Modifier and Type | Class and Description |
|---|---|
static class |
AvroIO.Write.Bound<T>
A
PTransform that writes a bounded PCollection to an Avro file (or
multiple Avro files matching a sharding pattern). |
| Modifier and Type | Method and Description |
|---|---|
static AvroIO.Write.Bound<GenericRecord> |
to(String prefix)
Returns a
PTransform that writes to the file(s)
with the given prefix. |
static AvroIO.Write.Bound<GenericRecord> |
withNumShards(int numShards)
Returns a
PTransform that uses the provided shard count. |
static AvroIO.Write.Bound<GenericRecord> |
withoutSharding()
Returns a
PTransform that forces a single file as
output. |
static AvroIO.Write.Bound<GenericRecord> |
withoutValidation()
Returns a
PTransform that writes Avro file(s) that has GCS path validation on
pipeline creation disabled. |
static <T> AvroIO.Write.Bound<T> |
withSchema(Class<T> type)
Returns a
PTransform that writes Avro file(s)
containing records whose type is the specified Avro-generated class. |
static AvroIO.Write.Bound<GenericRecord> |
withSchema(Schema schema)
Returns a
PTransform that writes Avro file(s)
containing records of the specified schema. |
static AvroIO.Write.Bound<GenericRecord> |
withSchema(String schema)
Returns a
PTransform that writes Avro file(s)
containing records of the specified schema in a JSON-encoded
string form. |
static AvroIO.Write.Bound<GenericRecord> |
withShardNameTemplate(String shardTemplate)
Returns a
PTransform that uses the given shard name
template. |
static AvroIO.Write.Bound<GenericRecord> |
withSuffix(String filenameSuffix)
Returns a
PTransform that writes to the file(s) with the
given filename suffix. |
public static AvroIO.Write.Bound<GenericRecord> to(String prefix)
PTransform that writes to the file(s)
with the given prefix. This can be a local filename
(if running locally), or a Google Cloud Storage filename of
the form "gs://<bucket>/<filepath>"
(if running locally or via the Google Cloud Dataflow service).
The files written will begin with this prefix, followed by
a shard identifier (see AvroIO.Write.Bound.withNumShards(int), and end
in a common extension, if given by AvroIO.Write.Bound.withSuffix(java.lang.String).
public static AvroIO.Write.Bound<GenericRecord> withSuffix(String filenameSuffix)
PTransform that writes to the file(s) with the
given filename suffix.public static AvroIO.Write.Bound<GenericRecord> withNumShards(int numShards)
PTransform that uses the provided shard count.
Constraining the number of shards is likely to reduce the performance of a pipeline. Setting this value is not recommended unless you require a specific number of output files.
numShards - the number of shards to use, or 0 to let the system
decide.public static AvroIO.Write.Bound<GenericRecord> withShardNameTemplate(String shardTemplate)
PTransform that uses the given shard name
template.
See ShardNameTemplate for a description of shard templates.
public static AvroIO.Write.Bound<GenericRecord> withoutSharding()
PTransform that forces a single file as
output.
Constraining the number of shards is likely to reduce the performance of a pipeline. Setting this value is not recommended unless you require a specific number of output files.
public static <T> AvroIO.Write.Bound<T> withSchema(Class<T> type)
PTransform that writes Avro file(s)
containing records whose type is the specified Avro-generated class.T - the type of the elements of the input PCollectionpublic static AvroIO.Write.Bound<GenericRecord> withSchema(Schema schema)
PTransform that writes Avro file(s)
containing records of the specified schema.public static AvroIO.Write.Bound<GenericRecord> withSchema(String schema)
PTransform that writes Avro file(s)
containing records of the specified schema in a JSON-encoded
string form.public static AvroIO.Write.Bound<GenericRecord> withoutValidation()
PTransform that writes Avro file(s) that has GCS path validation on
pipeline creation disabled.
This can be useful in the case where the GCS output location does not exist at the pipeline creation time, but is expected to be available at execution time.