public class TextIO extends Object
PTransforms for reading and writing text files.
To read a PCollection from one or more text files, use TextIO.Read.
You can instantiate a transform using TextIO.Read.from(String) to specify
the path of the file(s) to read from (e.g., a local filename or
filename pattern if running locally, or a Google Cloud Storage
filename or filename pattern of the form
"gs://<bucket>/<filepath>").
By default, TextIO.Read returns a PCollection of Strings,
each corresponding to one line of an input UTF-8 text file. To convert directly from the raw
bytes (split into lines delimited by '\n', '\r', or '\r\n') to another object of type T,
supply a Coder<T> using TextIO.Read.withCoder(Coder).
See the following examples:
Pipeline p = ...;
// A simple Read of a local file (only runs locally):
PCollection<String> lines =
p.apply(TextIO.Read.from("/local/path/to/file.txt"));
// A fully-specified Read from a GCS file (runs locally and via the
// Google Cloud Dataflow service):
PCollection<Integer> numbers =
p.apply("ReadNumbers", TextIO.Read
.from("gs://my_bucket/path/to/numbers-*.txt")
.withCoder(TextualIntegerCoder.of()));
To write a PCollection to one or more text files, use
TextIO.Write, specifying TextIO.Write.to(String) to specify
the path of the file to write to (e.g., a local filename or sharded
filename pattern if running locally, or a Google Cloud Storage
filename or sharded filename pattern of the form
"gs://<bucket>/<filepath>"). You can use TextIO.Write.withCoder(Coder)
to specify the Coder to use to encode the Java values into text lines.
Any existing files with the same names as generated output files will be overwritten.
For example:
// A simple Write to a local file (only runs locally):
PCollection<String> lines = ...;
lines.apply(TextIO.Write.to("/path/to/file.txt"));
// A fully-specified Write to a sharded GCS file (runs locally and via the
// Google Cloud Dataflow service):
PCollection<Integer> numbers = ...;
numbers.apply("WriteNumbers", TextIO.Write
.to("gs://my_bucket/path/to/numbers")
.withSuffix(".txt")
.withCoder(TextualIntegerCoder.of()));
When run using the DirectRunner, your pipeline can read and write text files
on your local drive and remote text files on Google Cloud Storage that you have access to using
your gcloud credentials. When running in the Dataflow service, the pipeline can only
read and write files from GCS. For more information about permissions, see the Cloud Dataflow
documentation on Security
and Permissions.
| Modifier and Type | Class and Description |
|---|---|
static class |
TextIO.CompressionType
Possible text file compression types.
|
static class |
TextIO.Read
A
PTransform that reads from a text file (or multiple text
files matching a pattern) and returns a PCollection containing
the decoding of each of the lines of the text file(s). |
static class |
TextIO.Write
A
PTransform that writes a PCollection to text file (or
multiple text files matching a sharding pattern), with each
element of the input collection encoded into its own line. |
| Modifier and Type | Field and Description |
|---|---|
static Coder<String> |
DEFAULT_TEXT_CODER
The default coder, which returns each line of the input file as a string.
|