public class AnySAMInputFormat extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,SAMRecordWritable>
InputFormat for SAM, BAM, and CRAM files.
Values are the individual records; see BAMRecordReader for the
meaning of the key.
By default, files are recognized as SAM, BAM, or CRAM based on their file
extensions: see TRUST_EXTS_PROPERTY. If that fails, or this
behaviour is disabled, the first byte of each file is read to determine the
file type.
| Modifier and Type | Field and Description |
|---|---|
static String |
TRUST_EXTS_PROPERTY
A Boolean property: are file extensions trusted? The default is
true. |
| Constructor and Description |
|---|
AnySAMInputFormat()
Creates a new input format, which will use the
Configuration from the first public method called. |
AnySAMInputFormat(org.apache.hadoop.conf.Configuration conf)
Creates a new input format, reading
TRUST_EXTS_PROPERTY from
the given Configuration. |
AnySAMInputFormat(Map<org.apache.hadoop.fs.Path,SAMFormat> formatMap)
Creates a new input format, trusting the given
Map to
define the file-to-format associations. |
| Modifier and Type | Method and Description |
|---|---|
org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,SAMRecordWritable> |
createRecordReader(org.apache.hadoop.mapreduce.InputSplit split,
org.apache.hadoop.mapreduce.TaskAttemptContext ctx)
Returns a
BAMRecordReader or SAMRecordReader as
appropriate, initialized with the given parameters. |
SAMFormat |
getFormat(org.apache.hadoop.fs.Path path)
Returns the
SAMFormat corresponding to the given path. |
List<org.apache.hadoop.mapreduce.InputSplit> |
getSplits(org.apache.hadoop.mapreduce.JobContext job)
Defers to
BAMInputFormat or CRAMInputFormat as appropriate for each
individual path. |
boolean |
isSplitable(org.apache.hadoop.mapreduce.JobContext job,
org.apache.hadoop.fs.Path path)
|
addInputPath, addInputPathRecursively, addInputPaths, computeSplitSize, getBlockIndex, getFormatMinSplitSize, getInputDirRecursive, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, listStatus, makeSplit, setInputDirRecursive, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSizepublic static final String TRUST_EXTS_PROPERTY
true.public AnySAMInputFormat()
Configuration from the first public method called. Thus this
will behave as though constructed with a Configuration
directly, but only after it has received it in
createRecordReader (via the TaskAttemptContext)
or isSplitable or getSplits (via the
JobContext). Until then, other methods will throw an IllegalStateException.
This constructor exists mainly as a convenience, e.g. so that
AnySAMInputFormat can be used directly in
Job.setInputFormatClass.public AnySAMInputFormat(org.apache.hadoop.conf.Configuration conf)
TRUST_EXTS_PROPERTY from
the given Configuration.public AnySAMInputFormat(Map<org.apache.hadoop.fs.Path,SAMFormat> formatMap)
Map to
define the file-to-format associations. Neither file paths nor their
contents are looked at, only the Map is used.
The Map is not copied, so it should not be modified while
this input format is in use!
public SAMFormat getFormat(org.apache.hadoop.fs.Path path) throws org.apache.hadoop.fs.PathNotFoundException
SAMFormat corresponding to the given path. Returns
null if it cannot be determined even based on the file
contents (unless future SAM/BAM formats are very different, this means
that the path does not refer to a SAM or BAM file).
If this input format was constructed using a given
Map<Path,SAMFormat> and the path is not contained
within that map, throws an IllegalArgumentException.
org.apache.hadoop.fs.PathNotFoundExceptionpublic org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,SAMRecordWritable> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext ctx) throws InterruptedException, IOException
BAMRecordReader or SAMRecordReader as
appropriate, initialized with the given parameters.
Throws IllegalArgumentException if the given input split is
not a FileVirtualSplit (used by BAMInputFormat) or a
FileSplit (used by SAMInputFormat), or if the path
referred to is not recognized as a SAM, BAM, or CRAM file (see getFormat(org.apache.hadoop.fs.Path)).
createRecordReader in class org.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.LongWritable,SAMRecordWritable>InterruptedExceptionIOExceptionpublic boolean isSplitable(org.apache.hadoop.mapreduce.JobContext job,
org.apache.hadoop.fs.Path path)
isSplitable in class org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,SAMRecordWritable>public List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext job) throws IOException
BAMInputFormat or CRAMInputFormat as appropriate for each
individual path. SAM paths do not require special handling, so their splits are left
unchanged.getSplits in class org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,SAMRecordWritable>IOExceptionCopyright © 2017. All rights reserved.