Class GetHDFSSequenceFile
java.lang.Object
org.apache.nifi.components.AbstractConfigurableComponent
org.apache.nifi.processor.AbstractSessionFactoryProcessor
org.apache.nifi.processor.AbstractProcessor
org.apache.nifi.processors.hadoop.AbstractHadoopProcessor
org.apache.nifi.processors.hadoop.GetHDFS
org.apache.nifi.processors.hadoop.GetHDFSSequenceFile
- All Implemented Interfaces:
org.apache.nifi.components.ClassloaderIsolationKeyProvider,org.apache.nifi.components.ConfigurableComponent,org.apache.nifi.processor.Processor
@TriggerWhenEmpty
@Tags({"hadoop","HCFS","HDFS","get","fetch","ingest","source","sequence file"})
@CapabilityDescription("Fetch sequence files from Hadoop Distributed File System (HDFS) into FlowFiles")
@SeeAlso(PutHDFS.class)
public class GetHDFSSequenceFile
extends GetHDFS
This processor is used to pull files from HDFS. The files being pulled in MUST be SequenceFile formatted files. The processor creates a flow file for each key/value entry in the ingested
SequenceFile. The created flow file's content depends on the value of the optional configuration property FlowFile Content. Currently, there are two choices: VALUE ONLY and KEY VALUE PAIR. With the
prior, only the SequenceFile value element is written to the flow file contents. With the latter, the SequenceFile key and value are written to the flow file contents as serialized objects; the
format is key length (int), key(String), value length(int), value(bytes). The default is VALUE ONLY.
NOTE: This processor loads the entire value entry into memory. While the size limit for a value entry is 2GB, this will cause memory problems if there are too many concurrent tasks and the data being ingested is large.
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.nifi.processors.hadoop.GetHDFS
GetHDFS.ProcessorConfigurationNested classes/interfaces inherited from class org.apache.nifi.processors.hadoop.AbstractHadoopProcessor
AbstractHadoopProcessor.ValidationResources -
Field Summary
FieldsModifier and TypeFieldDescription(package private) static final org.apache.nifi.components.PropertyDescriptor(package private) static final StringFields inherited from class org.apache.nifi.processors.hadoop.GetHDFS
BATCH_SIZE, BUFFER_SIZE, BUFFER_SIZE_DEFAULT, BUFFER_SIZE_KEY, FILE_FILTER_REGEX, FILTER_MATCH_NAME_ONLY, IGNORE_DOTTED_FILES, KEEP_SOURCE_FILE, MAX_AGE, MAX_WORKING_QUEUE_SIZE, MIN_AGE, POLLING_INTERVAL, processorConfig, RECURSE_SUBDIRS, REL_SUCCESSFields inherited from class org.apache.nifi.processors.hadoop.AbstractHadoopProcessor
ABSOLUTE_HDFS_PATH_ATTRIBUTE, ADDITIONAL_CLASSPATH_RESOURCES, COMPRESSION_CODEC, DIRECTORY, HADOOP_CONFIGURATION_RESOURCES, HADOOP_FILE_URL_ATTRIBUTE, hdfsResources, KERBEROS_USER_SERVICE, TARGET_HDFS_DIR_CREATED_ATTRIBUTE -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected Set<org.apache.nifi.flowfile.FlowFile> getFlowFiles(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.FileSystem hdfs, SequenceFileReader<Set<org.apache.nifi.flowfile.FlowFile>> reader, org.apache.hadoop.fs.Path file) protected List<org.apache.nifi.components.PropertyDescriptor> protected voidprocessBatchOfFiles(List<org.apache.hadoop.fs.Path> files, org.apache.nifi.processor.ProcessContext context, org.apache.nifi.processor.ProcessSession session) Methods inherited from class org.apache.nifi.processors.hadoop.GetHDFS
customValidate, getRelationships, onScheduled, onTrigger, performListing, selectFilesMethods inherited from class org.apache.nifi.processors.hadoop.AbstractHadoopProcessor
abstractOnScheduled, abstractOnStopped, checkHdfsUriForTimeout, findCause, getClassloaderIsolationKey, getCommonPropertyDescriptors, getCompressionCodec, getConfigLocations, getConfiguration, getFileSystem, getFileSystem, getFileSystemAsUser, getHadoopConfigurationForValidation, getNormalizedPath, getNormalizedPath, getNormalizedPath, getPathDifference, getUserGroupInformation, handleAuthErrors, init, isFileSystemAccessDenied, isLocalFileSystemAccessDenied, migrateProperties, preProcessConfiguration, resetHDFSResources, validateFileSystemMethods inherited from class org.apache.nifi.processor.AbstractProcessor
onTriggerMethods inherited from class org.apache.nifi.processor.AbstractSessionFactoryProcessor
getControllerServiceLookup, getIdentifier, getLogger, getNodeTypeProvider, initialize, isConfigurationRestored, isScheduled, toString, updateConfiguredRestoredTrue, updateScheduledFalse, updateScheduledTrueMethods inherited from class org.apache.nifi.components.AbstractConfigurableComponent
equals, getPropertyDescriptor, getPropertyDescriptors, getSupportedDynamicPropertyDescriptor, hashCode, onPropertyModified, validateMethods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, waitMethods inherited from interface org.apache.nifi.components.ConfigurableComponent
getPropertyDescriptor, getPropertyDescriptors, onPropertyModified, validateMethods inherited from interface org.apache.nifi.processor.Processor
isStateful, migrateRelationships
-
Field Details
-
VALUE_ONLY
- See Also:
-
FLOWFILE_CONTENT
static final org.apache.nifi.components.PropertyDescriptor FLOWFILE_CONTENT
-
-
Constructor Details
-
GetHDFSSequenceFile
public GetHDFSSequenceFile()
-
-
Method Details
-
getSupportedPropertyDescriptors
- Overrides:
getSupportedPropertyDescriptorsin classGetHDFS
-
processBatchOfFiles
protected void processBatchOfFiles(List<org.apache.hadoop.fs.Path> files, org.apache.nifi.processor.ProcessContext context, org.apache.nifi.processor.ProcessSession session) - Overrides:
processBatchOfFilesin classGetHDFS
-
getFlowFiles
protected Set<org.apache.nifi.flowfile.FlowFile> getFlowFiles(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.FileSystem hdfs, SequenceFileReader<Set<org.apache.nifi.flowfile.FlowFile>> reader, org.apache.hadoop.fs.Path file) throws Exception - Throws:
Exception
-