public class BAMInputFormat extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,SAMRecordWritable>
InputFormat for BAM files. Values
are the individual records; see BAMRecordReader for the meaning of
the key.| Modifier and Type | Field and Description |
|---|---|
static String |
BOUNDED_TRAVERSAL_PROPERTY
If set to true, only include reads that overlap the given intervals (if specified),
and unplaced unmapped reads (if specified).
|
static String |
ENABLE_BAI_SPLIT_CALCULATOR
If set to true, enables the use of BAM indices to calculate splits.
|
static String |
INTERVALS_PROPERTY
Filter by region, like
-L in SAMtools. |
static String |
TRAVERSE_UNPLACED_UNMAPPED_PROPERTY
If set to true, include unplaced unmapped reads (that is, unmapped reads with no
position).
|
| Constructor and Description |
|---|
BAMInputFormat() |
| Modifier and Type | Method and Description |
|---|---|
org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,SAMRecordWritable> |
createRecordReader(org.apache.hadoop.mapreduce.InputSplit split,
org.apache.hadoop.mapreduce.TaskAttemptContext ctx)
Returns a
BAMRecordReader initialized with the parameters. |
List<org.apache.hadoop.mapreduce.InputSplit> |
getSplits(org.apache.hadoop.mapreduce.JobContext job)
The splits returned are
FileVirtualSplits. |
List<org.apache.hadoop.mapreduce.InputSplit> |
getSplits(List<org.apache.hadoop.mapreduce.InputSplit> splits,
org.apache.hadoop.conf.Configuration cfg) |
boolean |
isSplitable(org.apache.hadoop.mapreduce.JobContext job,
org.apache.hadoop.fs.Path path) |
static void |
setEnableBAISplitCalculator(org.apache.hadoop.conf.Configuration conf,
boolean setEnabled)
Enables or disables the split calculator that uses the BAM index to calculate splits.
|
static <T extends htsjdk.samtools.util.Locatable> |
setIntervals(org.apache.hadoop.conf.Configuration conf,
List<T> intervals)
Only include reads that overlap the given intervals.
|
static <T extends htsjdk.samtools.util.Locatable> |
setTraversalParameters(org.apache.hadoop.conf.Configuration conf,
List<T> intervals,
boolean traverseUnplacedUnmapped)
Only include reads that overlap the given intervals (if specified) and unplaced
unmapped reads (if
true). |
static void |
unsetTraversalParameters(org.apache.hadoop.conf.Configuration conf)
Reset traversal parameters so that all reads are included.
|
addInputPath, addInputPathRecursively, addInputPaths, computeSplitSize, getBlockIndex, getFormatMinSplitSize, getInputDirRecursive, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, listStatus, makeSplit, setInputDirRecursive, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSizepublic static final String BOUNDED_TRAVERSAL_PROPERTY
setTraversalParameters(Configuration, List, boolean) should be preferred.public static final String ENABLE_BAI_SPLIT_CALCULATOR
setEnableBAISplitCalculator(Configuration, boolean) should be preferred.
By default, this split calculator is disabled in favor of the splitting-bai calculator.public static final String INTERVALS_PROPERTY
-L in SAMtools. Takes a comma-separated
list of intervals, e.g. chr1:1-20000,chr2:12000-20000. For
programmatic use setIntervals(Configuration, List) should be preferred.public static final String TRAVERSE_UNPLACED_UNMAPPED_PROPERTY
setTraversalParameters(Configuration, List, boolean) should be preferred.public static <T extends htsjdk.samtools.util.Locatable> void setIntervals(org.apache.hadoop.conf.Configuration conf,
List<T> intervals)
T - the Locatable typeconf - the Hadoop configuration to set properties onintervals - the intervals to filter bypublic static void setEnableBAISplitCalculator(org.apache.hadoop.conf.Configuration conf,
boolean setEnabled)
public static <T extends htsjdk.samtools.util.Locatable> void setTraversalParameters(org.apache.hadoop.conf.Configuration conf,
List<T> intervals,
boolean traverseUnplacedUnmapped)
true).T - the Locatable typeconf - the Hadoop configuration to set properties onintervals - the intervals to filter by, or null if all reads
are to be included (in which case traverseUnplacedUnmapped must be
true)traverseUnplacedUnmapped - whether to included unplaced unampped readspublic static void unsetTraversalParameters(org.apache.hadoop.conf.Configuration conf)
conf - the Hadoop configuration to set properties onpublic org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,SAMRecordWritable> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext ctx) throws InterruptedException, IOException
BAMRecordReader initialized with the parameters.createRecordReader in class org.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.LongWritable,SAMRecordWritable>InterruptedExceptionIOExceptionpublic List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext job) throws IOException
FileVirtualSplits.getSplits in class org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,SAMRecordWritable>IOExceptionpublic List<org.apache.hadoop.mapreduce.InputSplit> getSplits(List<org.apache.hadoop.mapreduce.InputSplit> splits, org.apache.hadoop.conf.Configuration cfg) throws IOException
IOExceptionpublic boolean isSplitable(org.apache.hadoop.mapreduce.JobContext job,
org.apache.hadoop.fs.Path path)
isSplitable in class org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,SAMRecordWritable>Copyright © 2017. All rights reserved.