Class ContextualTextIO.Read
- java.lang.Object
-
- org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PBegin,org.apache.beam.sdk.values.PCollection<org.apache.beam.sdk.values.Row>>
-
- org.apache.beam.sdk.io.contextualtextio.ContextualTextIO.Read
-
- All Implemented Interfaces:
java.io.Serializable,org.apache.beam.sdk.transforms.display.HasDisplayData
- Enclosing class:
- ContextualTextIO
public abstract static class ContextualTextIO.Read extends org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PBegin,org.apache.beam.sdk.values.PCollection<org.apache.beam.sdk.values.Row>>Implementation ofContextualTextIO.read().- See Also:
- Serialized Form
-
-
Constructor Summary
Constructors Constructor Description Read()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description org.apache.beam.sdk.values.PCollection<org.apache.beam.sdk.values.Row>expand(org.apache.beam.sdk.values.PBegin input)ContextualTextIO.Readfrom(java.lang.String filepattern)Reads text from the file(s) with the given filename or filename pattern.ContextualTextIO.Readfrom(org.apache.beam.sdk.options.ValueProvider<java.lang.String> filepattern)Same asfrom(filepattern), but accepting aValueProvider.protected org.apache.beam.sdk.io.FileBasedSource<org.apache.beam.sdk.values.Row>getSource()voidpopulateDisplayData(org.apache.beam.sdk.transforms.display.DisplayData.Builder builder)ContextualTextIO.ReadwithCompression(org.apache.beam.sdk.io.Compression compression)Reads from input sources using the specified compression type.ContextualTextIO.ReadwithDelimiter(byte[] delimiter)Set the custom delimiter to be used in place of the default ones ('\r', '\n' or '\r\n').ContextualTextIO.ReadwithEmptyMatchTreatment(org.apache.beam.sdk.io.fs.EmptyMatchTreatment treatment)SeeFileIO.MatchConfiguration.withEmptyMatchTreatment(org.apache.beam.sdk.io.fs.EmptyMatchTreatment).ContextualTextIO.ReadwithHasMultilineCSVRecords(java.lang.Boolean hasMultilineCSVRecords)When reading RFC4180 CSV files that have values that span multiple lines, set this to true.ContextualTextIO.ReadwithHintMatchesManyFiles()Hints that the filepattern specified infrom(String)matches a very large number of files.ContextualTextIO.ReadwithMatchConfiguration(org.apache.beam.sdk.io.FileIO.MatchConfiguration matchConfiguration)Sets theFileIO.MatchConfiguration.ContextualTextIO.ReadwithRecordNumMetadata()Allows the user to opt into getting recordNums associated with each record.
-
-
-
Method Detail
-
from
public ContextualTextIO.Read from(java.lang.String filepattern)
Reads text from the file(s) with the given filename or filename pattern.This can be a local path (if running locally), or a Google Cloud Storage filename or filename pattern of the form
"gs://<bucket>/<filepath>"(if running locally or using remote execution service).Standard Java Filesystem glob patterns ("*", "?", "[..]") are supported.
If it is known that the filepattern will match a very large number of files (at least tens of thousands), use
withHintMatchesManyFiles()for better performance and scalability.
-
from
public ContextualTextIO.Read from(org.apache.beam.sdk.options.ValueProvider<java.lang.String> filepattern)
Same asfrom(filepattern), but accepting aValueProvider.
-
withMatchConfiguration
public ContextualTextIO.Read withMatchConfiguration(org.apache.beam.sdk.io.FileIO.MatchConfiguration matchConfiguration)
Sets theFileIO.MatchConfiguration.
-
withHasMultilineCSVRecords
public ContextualTextIO.Read withHasMultilineCSVRecords(java.lang.Boolean hasMultilineCSVRecords)
When reading RFC4180 CSV files that have values that span multiple lines, set this to true. Note: this reduces the read performance (see:ContextualTextIO).
-
withCompression
public ContextualTextIO.Read withCompression(org.apache.beam.sdk.io.Compression compression)
Reads from input sources using the specified compression type.If no compression type is specified, the default is
Compression.AUTO.
-
withHintMatchesManyFiles
public ContextualTextIO.Read withHintMatchesManyFiles()
Hints that the filepattern specified infrom(String)matches a very large number of files.This hint may cause a runner to execute the transform differently, in a way that improves performance for this case, but it may worsen performance if the filepattern matches only a small number of files (e.g., in a runner that supports dynamic work rebalancing, it will happen less efficiently within individual files).
-
withRecordNumMetadata
public ContextualTextIO.Read withRecordNumMetadata()
Allows the user to opt into getting recordNums associated with each record. This option is only supported with default triggers.When set to true, it will introduce a grouping step to assemble the recordNums for each record, which will increase the resources used by the pipeline.
Use this when you need metadata like fileNames and you need processed position/order information.
-
withEmptyMatchTreatment
public ContextualTextIO.Read withEmptyMatchTreatment(org.apache.beam.sdk.io.fs.EmptyMatchTreatment treatment)
SeeFileIO.MatchConfiguration.withEmptyMatchTreatment(org.apache.beam.sdk.io.fs.EmptyMatchTreatment).
-
withDelimiter
public ContextualTextIO.Read withDelimiter(byte[] delimiter)
Set the custom delimiter to be used in place of the default ones ('\r', '\n' or '\r\n').
-
expand
public org.apache.beam.sdk.values.PCollection<org.apache.beam.sdk.values.Row> expand(org.apache.beam.sdk.values.PBegin input)
- Specified by:
expandin classorg.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PBegin,org.apache.beam.sdk.values.PCollection<org.apache.beam.sdk.values.Row>>
-
getSource
protected org.apache.beam.sdk.io.FileBasedSource<org.apache.beam.sdk.values.Row> getSource()
-
populateDisplayData
public void populateDisplayData(org.apache.beam.sdk.transforms.display.DisplayData.Builder builder)
- Specified by:
populateDisplayDatain interfaceorg.apache.beam.sdk.transforms.display.HasDisplayData- Overrides:
populateDisplayDatain classorg.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PBegin,org.apache.beam.sdk.values.PCollection<org.apache.beam.sdk.values.Row>>
-
-