Class GetHDFSSequenceFile

java.lang.Object
org.apache.nifi.components.AbstractConfigurableComponent
org.apache.nifi.processor.AbstractSessionFactoryProcessor
org.apache.nifi.processor.AbstractProcessor
All Implemented Interfaces:
org.apache.nifi.components.ClassloaderIsolationKeyProvider, org.apache.nifi.components.ConfigurableComponent, org.apache.nifi.processor.Processor

@TriggerWhenEmpty @Tags({"hadoop","HCFS","HDFS","get","fetch","ingest","source","sequence file"}) @CapabilityDescription("Fetch sequence files from Hadoop Distributed File System (HDFS) into FlowFiles") @SeeAlso(PutHDFS.class) public class GetHDFSSequenceFile extends GetHDFS
This processor is used to pull files from HDFS. The files being pulled in MUST be SequenceFile formatted files. The processor creates a flow file for each key/value entry in the ingested SequenceFile. The created flow file's content depends on the value of the optional configuration property FlowFile Content. Currently, there are two choices: VALUE ONLY and KEY VALUE PAIR. With the prior, only the SequenceFile value element is written to the flow file contents. With the latter, the SequenceFile key and value are written to the flow file contents as serialized objects; the format is key length (int), key(String), value length(int), value(bytes). The default is VALUE ONLY.

NOTE: This processor loads the entire value entry into memory. While the size limit for a value entry is 2GB, this will cause memory problems if there are too many concurrent tasks and the data being ingested is large.

  • Field Details

    • VALUE_ONLY

      static final String VALUE_ONLY
      See Also:
    • FLOWFILE_CONTENT

      static final org.apache.nifi.components.PropertyDescriptor FLOWFILE_CONTENT
  • Constructor Details

    • GetHDFSSequenceFile

      public GetHDFSSequenceFile()
  • Method Details

    • getSupportedPropertyDescriptors

      protected List<org.apache.nifi.components.PropertyDescriptor> getSupportedPropertyDescriptors()
      Overrides:
      getSupportedPropertyDescriptors in class GetHDFS
    • processBatchOfFiles

      protected void processBatchOfFiles(List<org.apache.hadoop.fs.Path> files, org.apache.nifi.processor.ProcessContext context, org.apache.nifi.processor.ProcessSession session)
      Overrides:
      processBatchOfFiles in class GetHDFS
    • getFlowFiles

      protected Set<org.apache.nifi.flowfile.FlowFile> getFlowFiles(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.FileSystem hdfs, SequenceFileReader<Set<org.apache.nifi.flowfile.FlowFile>> reader, org.apache.hadoop.fs.Path file) throws Exception
      Throws:
      Exception