public final class HadoopDataInputStream
extends org.apache.flink.core.fs.FSDataInputStream
FSDataInputStream for Hadoop's input streams. This
supports all file systems supported by Hadoop, such as HDFS and S3 (S3a/S3n).| 限定符和类型 | 字段和说明 |
|---|---|
static int |
MIN_SKIP_BYTES
Minimum amount of bytes to skip forward before we issue a seek instead of discarding read.
|
| 构造器和说明 |
|---|
HadoopDataInputStream(org.apache.hadoop.fs.FSDataInputStream fsDataInputStream)
Creates a new data input stream from the given Hadoop input stream.
|
| 限定符和类型 | 方法和说明 |
|---|---|
int |
available() |
void |
close() |
void |
forceSeek(long seekPos)
Positions the stream to the given location.
|
org.apache.hadoop.fs.FSDataInputStream |
getHadoopInputStream()
Gets the wrapped Hadoop input stream.
|
long |
getPos() |
int |
read() |
int |
read(byte[] buffer,
int offset,
int length) |
void |
seek(long seekPos) |
long |
skip(long n) |
void |
skipFully(long bytes)
Skips over a given amount of bytes in the stream.
|
mark, markSupported, read, resetpublic static final int MIN_SKIP_BYTES
The current value is just a magic number. In the long run, this value could become configurable, but for now it is a conservative, relatively small value that should bring safe improvements for small skips (e.g. in reading meta data), that would hurt the most with frequent seeks.
The optimal value depends on the DFS implementation and configuration plus the underlying filesystem. For now, this number is chosen "big enough" to provide improvements for smaller seeks, and "small enough" to avoid disadvantages over real seeks. While the minimum should be the page size, a true optimum per system would be the amounts of bytes the can be consumed sequentially within the seektime. Unfortunately, seektime is not constant and devices, OS, and DFS potentially also use read buffers and read-ahead.
public HadoopDataInputStream(org.apache.hadoop.fs.FSDataInputStream fsDataInputStream)
fsDataInputStream - The Hadoop input streampublic void seek(long seekPos)
throws IOException
seek 在类中 org.apache.flink.core.fs.FSDataInputStreamIOExceptionpublic long getPos()
throws IOException
getPos 在类中 org.apache.flink.core.fs.FSDataInputStreamIOExceptionpublic int read()
throws IOException
read 在类中 InputStreamIOExceptionpublic void close()
throws IOException
close 在接口中 Closeableclose 在接口中 AutoCloseableclose 在类中 InputStreamIOExceptionpublic int read(@Nonnull byte[] buffer, int offset, int length) throws IOException
read 在类中 InputStreamIOExceptionpublic int available()
throws IOException
available 在类中 InputStreamIOExceptionpublic long skip(long n)
throws IOException
skip 在类中 InputStreamIOExceptionpublic org.apache.hadoop.fs.FSDataInputStream getHadoopInputStream()
public void forceSeek(long seekPos)
throws IOException
seek(long), this method
will always issue a "seek" command to the dfs and may not replace it by skip(long)
for small seeks.
Notice that the underlying DFS implementation can still decide to do skip instead of seek.
seekPos - the position to seek to.IOExceptionpublic void skipFully(long bytes)
throws IOException
bytes - the number of bytes to skip.IOExceptionCopyright © 2014–2021 The Apache Software Foundation. All rights reserved.