@Public public abstract class BinaryInputFormat<T> extends FileInputFormat<T> implements CheckpointableInputFormat<FileInputSplit,Tuple2<Long,Long>>
BlockInfo at the end of the block. There, the reader can find some statistics
about the split currently being read, that will help correctly parse the contents of the block.| Modifier and Type | Class and Description |
|---|---|
protected class |
BinaryInputFormat.BlockBasedInput
Reads the content of a block of data.
|
FileInputFormat.FileBaseStatistics, FileInputFormat.InputSplitOpenThread| Modifier and Type | Field and Description |
|---|---|
static String |
BLOCK_SIZE_PARAMETER_KEY
The config parameter which defines the fixed length of a record.
|
static long |
NATIVE_BLOCK_SIZE |
currentSplit, ENUMERATE_NESTED_FILES_FLAG, enumerateNestedFiles, filePath, INFLATER_INPUT_STREAM_FACTORIES, minSplitSize, numSplits, openTimeout, READ_WHOLE_SPLIT_FLAG, splitLength, splitStart, stream, unsplittable| Constructor and Description |
|---|
BinaryInputFormat() |
| Modifier and Type | Method and Description |
|---|---|
void |
configure(Configuration parameters)
Configures the file input format by reading the file path from the configuration.
|
BlockInfo |
createBlockInfo() |
FileInputSplit[] |
createInputSplits(int minNumSplits)
Computes the input splits for the file.
|
protected org.apache.flink.api.common.io.BinaryInputFormat.SequentialStatistics |
createStatistics(List<FileStatus> files,
FileInputFormat.FileBaseStatistics stats)
Fill in the statistics.
|
protected abstract T |
deserialize(T reuse,
DataInputView dataInput) |
long |
getBlockSize() |
Tuple2<Long,Long> |
getCurrentState()
Returns the split currently being read, along with its current state.
|
protected List<FileStatus> |
getFiles() |
protected FileInputSplit[] |
getInputSplits() |
org.apache.flink.api.common.io.BinaryInputFormat.SequentialStatistics |
getStatistics(BaseStatistics cachedStats)
Obtains basic file statistics containing only file size.
|
T |
nextRecord(T record)
Reads the next record from the input.
|
void |
open(FileInputSplit split)
Opens an input stream to the file defined in the input format.
|
boolean |
reachedEnd()
Method used to check if the end of the input is reached.
|
void |
reopen(FileInputSplit split,
Tuple2<Long,Long> state)
Restores the state of a parallel instance reading from an
InputFormat. |
void |
setBlockSize(long blockSize) |
acceptFile, close, decorateInputStream, extractFileExtension, getFilePath, getFilePaths, getFileStats, getFileStats, getInflaterInputStreamFactory, getInputSplitAssigner, getMinSplitSize, getNestedFileEnumeration, getNumSplits, getOpenTimeout, getSplitLength, getSplitStart, registerInflaterInputStreamFactory, setFilePath, setFilePath, setFilePaths, setFilePaths, setFilesFilter, setMinSplitSize, setNestedFileEnumeration, setNumSplits, setOpenTimeout, supportsMultiPaths, testForUnsplittable, toStringcloseInputFormat, getRuntimeContext, openInputFormat, setRuntimeContextpublic static final String BLOCK_SIZE_PARAMETER_KEY
public static final long NATIVE_BLOCK_SIZE
public void configure(Configuration parameters)
FileInputFormatconfigure in interface InputFormat<T,FileInputSplit>configure in class FileInputFormat<T>parameters - The configuration with all parameters (note: not the Flink config but the TaskConfig).InputFormat.configure(org.apache.flink.configuration.Configuration)public void setBlockSize(long blockSize)
public long getBlockSize()
public FileInputSplit[] createInputSplits(int minNumSplits) throws IOException
FileInputFormatcreateInputSplits in interface InputFormat<T,FileInputSplit>createInputSplits in interface InputSplitSource<FileInputSplit>createInputSplits in class FileInputFormat<T>minNumSplits - The minimum desired number of file splits.IOException - Thrown, when the creation of the splits was erroneous.InputFormat.createInputSplits(int)protected List<FileStatus> getFiles() throws IOException
IOExceptionpublic org.apache.flink.api.common.io.BinaryInputFormat.SequentialStatistics getStatistics(BaseStatistics cachedStats)
FileInputFormatgetStatistics in interface InputFormat<T,FileInputSplit>getStatistics in class FileInputFormat<T>cachedStats - The statistics that were cached. May be null.InputFormat.getStatistics(org.apache.flink.api.common.io.statistics.BaseStatistics)protected FileInputSplit[] getInputSplits() throws IOException
IOExceptionpublic BlockInfo createBlockInfo()
protected org.apache.flink.api.common.io.BinaryInputFormat.SequentialStatistics createStatistics(List<FileStatus> files, FileInputFormat.FileBaseStatistics stats) throws IOException
files - The files that are associated with this block input format.stats - The pre-filled statistics.IOExceptionpublic void open(FileInputSplit split) throws IOException
FileInputFormatThe stream is actually opened in an asynchronous thread to make sure any interruptions to the thread working on the input format do not reach the file system.
open in interface InputFormat<T,FileInputSplit>open in class FileInputFormat<T>split - The split to be opened.IOException - Thrown, if the spit could not be opened due to an I/O problem.public boolean reachedEnd()
throws IOException
InputFormatWhen this method is called, the input format it guaranteed to be opened.
reachedEnd in interface InputFormat<T,FileInputSplit>IOException - Thrown, if an I/O error occurred.public T nextRecord(T record) throws IOException
InputFormatWhen this method is called, the input format it guaranteed to be opened.
nextRecord in interface InputFormat<T,FileInputSplit>record - Object that may be reused.IOException - Thrown, if an I/O error occurred.protected abstract T deserialize(T reuse, DataInputView dataInput) throws IOException
IOException@PublicEvolving public Tuple2<Long,Long> getCurrentState() throws IOException
CheckpointableInputFormatgetCurrentState in interface CheckpointableInputFormat<FileInputSplit,Tuple2<Long,Long>>IOException - Thrown if the creation of the state object failed.@PublicEvolving public void reopen(FileInputSplit split, Tuple2<Long,Long> state) throws IOException
CheckpointableInputFormatInputFormat.
This is necessary when recovering from a task failure. When this method is called,
the input format it guaranteed to be configured.
NOTE: The caller has to make sure that the provided split is the one to whom
the state belongs.reopen in interface CheckpointableInputFormat<FileInputSplit,Tuple2<Long,Long>>split - The split to be opened.state - The state from which to start from. This can contain the offset,
but also other data, depending on the input format.IOExceptionCopyright © 2014–2018 The Apache Software Foundation. All rights reserved.