Class ContextualTextIO.Read

  • All Implemented Interfaces:
    java.io.Serializable, org.apache.beam.sdk.transforms.display.HasDisplayData
    Enclosing class:
    ContextualTextIO

    public abstract static class ContextualTextIO.Read
    extends org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PBegin,​org.apache.beam.sdk.values.PCollection<org.apache.beam.sdk.values.Row>>
    Implementation of ContextualTextIO.read().
    See Also:
    Serialized Form
    • Field Summary

      • Fields inherited from class org.apache.beam.sdk.transforms.PTransform

        name, resourceHints
    • Constructor Summary

      Constructors 
      Constructor Description
      Read()  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      org.apache.beam.sdk.values.PCollection<org.apache.beam.sdk.values.Row> expand​(org.apache.beam.sdk.values.PBegin input)  
      ContextualTextIO.Read from​(java.lang.String filepattern)
      Reads text from the file(s) with the given filename or filename pattern.
      ContextualTextIO.Read from​(org.apache.beam.sdk.options.ValueProvider<java.lang.String> filepattern)
      Same as from(filepattern), but accepting a ValueProvider.
      protected org.apache.beam.sdk.io.FileBasedSource<org.apache.beam.sdk.values.Row> getSource()  
      void populateDisplayData​(org.apache.beam.sdk.transforms.display.DisplayData.Builder builder)  
      ContextualTextIO.Read withCompression​(org.apache.beam.sdk.io.Compression compression)
      Reads from input sources using the specified compression type.
      ContextualTextIO.Read withDelimiter​(byte[] delimiter)
      Set the custom delimiter to be used in place of the default ones ('\r', '\n' or '\r\n').
      ContextualTextIO.Read withEmptyMatchTreatment​(org.apache.beam.sdk.io.fs.EmptyMatchTreatment treatment)
      See FileIO.MatchConfiguration.withEmptyMatchTreatment(org.apache.beam.sdk.io.fs.EmptyMatchTreatment).
      ContextualTextIO.Read withHasMultilineCSVRecords​(java.lang.Boolean hasMultilineCSVRecords)
      When reading RFC4180 CSV files that have values that span multiple lines, set this to true.
      ContextualTextIO.Read withHintMatchesManyFiles()
      Hints that the filepattern specified in from(String) matches a very large number of files.
      ContextualTextIO.Read withMatchConfiguration​(org.apache.beam.sdk.io.FileIO.MatchConfiguration matchConfiguration)
      Sets the FileIO.MatchConfiguration.
      ContextualTextIO.Read withRecordNumMetadata()
      Allows the user to opt into getting recordNums associated with each record.
      • Methods inherited from class org.apache.beam.sdk.transforms.PTransform

        compose, compose, getAdditionalInputs, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, getResourceHints, setResourceHints, toString, validate, validate
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
    • Constructor Detail

      • Read

        public Read()
    • Method Detail

      • from

        public ContextualTextIO.Read from​(java.lang.String filepattern)
        Reads text from the file(s) with the given filename or filename pattern.

        This can be a local path (if running locally), or a Google Cloud Storage filename or filename pattern of the form "gs://<bucket>/<filepath>" (if running locally or using remote execution service).

        Standard Java Filesystem glob patterns ("*", "?", "[..]") are supported.

        If it is known that the filepattern will match a very large number of files (at least tens of thousands), use withHintMatchesManyFiles() for better performance and scalability.

      • from

        public ContextualTextIO.Read from​(org.apache.beam.sdk.options.ValueProvider<java.lang.String> filepattern)
        Same as from(filepattern), but accepting a ValueProvider.
      • withMatchConfiguration

        public ContextualTextIO.Read withMatchConfiguration​(org.apache.beam.sdk.io.FileIO.MatchConfiguration matchConfiguration)
        Sets the FileIO.MatchConfiguration.
      • withHasMultilineCSVRecords

        public ContextualTextIO.Read withHasMultilineCSVRecords​(java.lang.Boolean hasMultilineCSVRecords)
        When reading RFC4180 CSV files that have values that span multiple lines, set this to true. Note: this reduces the read performance (see: ContextualTextIO).
      • withCompression

        public ContextualTextIO.Read withCompression​(org.apache.beam.sdk.io.Compression compression)
        Reads from input sources using the specified compression type.

        If no compression type is specified, the default is Compression.AUTO.

      • withHintMatchesManyFiles

        public ContextualTextIO.Read withHintMatchesManyFiles()
        Hints that the filepattern specified in from(String) matches a very large number of files.

        This hint may cause a runner to execute the transform differently, in a way that improves performance for this case, but it may worsen performance if the filepattern matches only a small number of files (e.g., in a runner that supports dynamic work rebalancing, it will happen less efficiently within individual files).

      • withRecordNumMetadata

        public ContextualTextIO.Read withRecordNumMetadata()
        Allows the user to opt into getting recordNums associated with each record. This option is only supported with default triggers.

        When set to true, it will introduce a grouping step to assemble the recordNums for each record, which will increase the resources used by the pipeline.

        Use this when you need metadata like fileNames and you need processed position/order information.

      • withEmptyMatchTreatment

        public ContextualTextIO.Read withEmptyMatchTreatment​(org.apache.beam.sdk.io.fs.EmptyMatchTreatment treatment)
        See FileIO.MatchConfiguration.withEmptyMatchTreatment(org.apache.beam.sdk.io.fs.EmptyMatchTreatment).
      • withDelimiter

        public ContextualTextIO.Read withDelimiter​(byte[] delimiter)
        Set the custom delimiter to be used in place of the default ones ('\r', '\n' or '\r\n').
      • expand

        public org.apache.beam.sdk.values.PCollection<org.apache.beam.sdk.values.Row> expand​(org.apache.beam.sdk.values.PBegin input)
        Specified by:
        expand in class org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PBegin,​org.apache.beam.sdk.values.PCollection<org.apache.beam.sdk.values.Row>>
      • getSource

        protected org.apache.beam.sdk.io.FileBasedSource<org.apache.beam.sdk.values.Row> getSource()
      • populateDisplayData

        public void populateDisplayData​(org.apache.beam.sdk.transforms.display.DisplayData.Builder builder)
        Specified by:
        populateDisplayData in interface org.apache.beam.sdk.transforms.display.HasDisplayData
        Overrides:
        populateDisplayData in class org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PBegin,​org.apache.beam.sdk.values.PCollection<org.apache.beam.sdk.values.Row>>