@groovy.util.logging.Log class PairScanner extends java.lang.Object
Scans a BAM file and feeds reads encountered to a pool of gngs.pair.PairLocator instances for matching to pairs. The gngs.pair.PairLocator is chosen based on read name so that any given locator is guaranteed to see both a read and it's mate, if the mate exists.
This class also supports a "sharding" parameter (shardId, shardSize, which allows only every nth read to be emitted with an offset. The result is that you can run multiple instances of this class in parallel and guarantee that each instance will emit a distinct set of reads.
Type | Name and description |
---|---|
static int |
DEFAULT_WRITER_QUEUE_SIZE How many formatted blocks ready to write will be buffered for writing. |
static int |
FORMATTER_BUFFER_SIZE The size of blocks to format before writing. |
java.util.List<Actor> |
actors |
boolean |
addPosition |
SAM |
bam |
java.lang.String |
baseQualityTag The tag from which to extract base quality scores (use actual base qualities if null) |
int |
chimeric |
java.util.Set<java.lang.Integer> |
chromosomesWithReads |
java.lang.String |
debugRead |
java.lang.String |
filterExpr |
java.util.List<PairFilter> |
filters |
groovyx.gpars.group.PGroup |
formatterGroup |
java.util.List<PairFormatter> |
formatters |
htsjdk.samtools.SAMRecord |
lastRead |
java.util.List<gngs.pair.PairLocator> |
locatorIndex |
java.util.List<gngs.pair.PairLocator> |
locators |
int |
numFormatters |
int |
numLocators |
PairWriter |
pairWriter |
PairWriter |
pairWriter2 |
ProgressCounter |
progress |
Regions |
regions |
static PairScanner |
running |
int |
shardId |
int |
shardSize |
int |
shuffleBufferSize The buffer within which to shuffle reads so as to randomise their output order |
groovyx.gpars.group.PGroup |
shufflerPGroup |
java.util.List<gngs.pair.Shuffler> |
shufflers |
boolean |
throttleWarning |
Constructor and description |
---|
PairScanner
(java.io.Writer writer1, java.io.Writer writer2, int numLocators, Regions regions, java.lang.String filterExpr, int writerQueueSize) |
PairScanner
(java.io.Writer writer, int numLocators, Regions regions, java.lang.String filterExpr, int writerQueueSize) |
Type Params | Return Type | Name and description |
---|---|---|
|
void |
createLocator(SAM bam, java.util.Set<java.lang.Integer> sequencesWithReads, gngs.pair.Shuffler shuffler) |
|
java.util.Set<java.lang.Integer> |
getContigsWithReads(SAM bam) Interrogate the BAM index to determine which contigs have reads. |
|
void |
initLocators(SAM bam) |
|
void |
scan(SAM bam) |
|
void |
stopActor(java.lang.String name, groovyx.gpars.actor.Actor actor) |
Methods inherited from class | Name |
---|---|
class java.lang.Object |
java.lang.Object#wait(long), java.lang.Object#wait(long, int), java.lang.Object#wait(), java.lang.Object#equals(java.lang.Object), java.lang.Object#toString(), java.lang.Object#hashCode(), java.lang.Object#getClass(), java.lang.Object#notify(), java.lang.Object#notifyAll() |
How many formatted blocks ready to write will be buffered for writing.
Each block is roughly the size of the FORMATTER_BUFFER_SIZE parameter
The size of blocks to format before writing.
Each block is accumulated until it reaches thi size. This is essentially similar to output file buffering, but occurs before the file system layer.
The tag from which to extract base quality scores (use actual base qualities if null)
The buffer within which to shuffle reads so as to randomise their output order
Interrogate the BAM index to determine which contigs have reads.
This is done to help better cope with BAM files where selected contigs have been included, leaving large numbers of mateless reads. By knowing up front that there are no reads for a given contig, and therefore a mate positioned in that contig will never be encountered, we can avoid storing those reads in memory.
bam
- BAM file to checkGroovy Documentation