PairScanner

gngs.pair.PairScanner

```
@groovy.util.logging.Log
class PairScanner
extends java.lang.Object
```
Scans a BAM file and feeds reads encountered to a pool of gngs.pair.PairLocator instances for matching to pairs. The gngs.pair.PairLocator is chosen based on read name so that any given locator is guaranteed to see both a read and it's mate, if the mate exists.
This class also supports a "sharding" parameter (shardId, shardSize, which allows only every nth read to be emitted with an offset. The result is that you can run multiple instances of this class in parallel and guarantee that each instance will emit a distinct set of reads.
Authors:
Simon Sadedin

Properties Summary

Properties
Type	Name and description
`static int`	`DEFAULT_WRITER_QUEUE_SIZE` How many formatted blocks ready to write will be buffered for writing.
`static int`	`FORMATTER_BUFFER_SIZE` The size of blocks to format before writing.
`java.util.List<Actor>`	`actors`
`boolean`	`addPosition`
`SAM`	`bam`
`java.lang.String`	`baseQualityTag` The tag from which to extract base quality scores (use actual base qualities if null)
`int`	`chimeric`
`java.util.Set<java.lang.Integer>`	`chromosomesWithReads`
`java.lang.String`	`debugRead`
`java.lang.String`	`filterExpr`
`java.util.List<PairFilter>`	`filters`
`groovyx.gpars.group.PGroup`	`formatterGroup`
`java.util.List<PairFormatter>`	`formatters`
`htsjdk.samtools.SAMRecord`	`lastRead`
`java.util.List<gngs.pair.PairLocator>`	`locatorIndex`
`java.util.List<gngs.pair.PairLocator>`	`locators`
`int`	`numFormatters`
`int`	`numLocators`
`PairWriter`	`pairWriter`
`PairWriter`	`pairWriter2`
`ProgressCounter`	`progress`
`Regions`	`regions`
`static PairScanner`	`running`
`int`	`shardId`
`int`	`shardSize`
`int`	`shuffleBufferSize` The buffer within which to shuffle reads so as to randomise their output order
`groovyx.gpars.group.PGroup`	`shufflerPGroup`
`java.util.List<gngs.pair.Shuffler>`	`shufflers`
`boolean`	`throttleWarning`

Constructor Summary

Constructors
Constructor and description
`PairScanner (java.io.Writer writer1, java.io.Writer writer2, int numLocators, Regions regions, java.lang.String filterExpr, int writerQueueSize)`
`PairScanner (java.io.Writer writer, int numLocators, Regions regions, java.lang.String filterExpr, int writerQueueSize)`

Methods Summary

Methods
Type Params	Return Type	Name and description
	`void`	`createLocator(SAM bam, java.util.Set<java.lang.Integer> sequencesWithReads, gngs.pair.Shuffler shuffler)`
	`java.util.Set<java.lang.Integer>`	`getContigsWithReads(SAM bam)` Interrogate the BAM index to determine which contigs have reads.
	`void`	`initLocators(SAM bam)`
	`void`	`scan(SAM bam)`
	`void`	`stopActor(java.lang.String name, groovyx.gpars.actor.Actor actor)`

Inherited Methods Summary

Inherited Methods
Methods inherited from class	Name
`class java.lang.Object`	`java.lang.Object#wait(long), java.lang.Object#wait(long, int), java.lang.Object#wait(), java.lang.Object#equals(java.lang.Object), java.lang.Object#toString(), java.lang.Object#hashCode(), java.lang.Object#getClass(), java.lang.Object#notify(), java.lang.Object#notifyAll()`

- Property Detail
  - static final int DEFAULT_WRITER_QUEUE_SIZE
    
    How many formatted blocks ready to write will be buffered for writing.
    Each block is roughly the size of the FORMATTER_BUFFER_SIZE parameter
  - static final int FORMATTER_BUFFER_SIZE
    
    The size of blocks to format before writing.
    Each block is accumulated until it reaches thi size. This is essentially similar to output file buffering, but occurs before the file system layer.
  - java.util.List<Actor> actors
  - boolean addPosition
  - SAM bam
  - java.lang.String baseQualityTag
    
    The tag from which to extract base quality scores (use actual base qualities if null)
  - int chimeric
  - java.util.Set<java.lang.Integer> chromosomesWithReads
  - java.lang.String debugRead
  - java.lang.String filterExpr
  - java.util.List<PairFilter> filters
  - groovyx.gpars.group.PGroup formatterGroup
  - java.util.List<PairFormatter> formatters
  - htsjdk.samtools.SAMRecord lastRead
  - java.util.List<gngs.pair.PairLocator> locatorIndex
  - java.util.List<gngs.pair.PairLocator> locators
  - int numFormatters
  - int numLocators
  - PairWriter pairWriter
  - PairWriter pairWriter2
  - ProgressCounter progress
  - Regions regions
  - static PairScanner running
  - int shardId
  - int shardSize
  - int shuffleBufferSize
    
    The buffer within which to shuffle reads so as to randomise their output order
  - groovyx.gpars.group.PGroup shufflerPGroup
  - java.util.List<gngs.pair.Shuffler> shufflers
  - boolean throttleWarning
- Constructor Detail
  - PairScanner(java.io.Writer writer1, java.io.Writer writer2, int numLocators, Regions regions, java.lang.String filterExpr, int writerQueueSize)
  - PairScanner(java.io.Writer writer, int numLocators, Regions regions, java.lang.String filterExpr, int writerQueueSize)
- Method Detail
  - @groovy.transform.CompileStatic void createLocator(SAM bam, java.util.Set<java.lang.Integer> sequencesWithReads, gngs.pair.Shuffler shuffler)
  - java.util.Set<java.lang.Integer> getContigsWithReads(SAM bam)
    
    Interrogate the BAM index to determine which contigs have reads.
    This is done to help better cope with BAM files where selected contigs have been included, leaving large numbers of mateless reads. By knowing up front that there are no reads for a given contig, and therefore a mate positioned in that contig will never be encountered, we can avoid storing those reads in memory.
    Parameters:
    bam - BAM file to check
    Returns:
    set of the indices of the reference sequences (contigs / chromosomes) that have at least one read
  - void initLocators(SAM bam)
  - @groovy.transform.CompileStatic void scan(SAM bam)
  - void stopActor(java.lang.String name, groovyx.gpars.actor.Actor actor)

Summary:
Property
Constructor
Method

| Detail:
Property
Constructor
Method

Groovy Documentation

[Groovy] Class PairScanner

Properties Summary

Constructor Summary

Methods Summary

Inherited Methods Summary

Property Detail

static final int DEFAULT_WRITER_QUEUE_SIZE

static final int FORMATTER_BUFFER_SIZE

java.util.List<Actor> actors

boolean addPosition

SAM bam

java.lang.String baseQualityTag

int chimeric

java.util.Set<java.lang.Integer> chromosomesWithReads

java.lang.String debugRead

java.lang.String filterExpr

java.util.List<PairFilter> filters

groovyx.gpars.group.PGroup formatterGroup

java.util.List<PairFormatter> formatters

htsjdk.samtools.SAMRecord lastRead

java.util.List<gngs.pair.PairLocator> locatorIndex

java.util.List<gngs.pair.PairLocator> locators

int numFormatters

int numLocators

PairWriter pairWriter

PairWriter pairWriter2

ProgressCounter progress

Regions regions

static PairScanner running

int shardId

int shardSize

int shuffleBufferSize

groovyx.gpars.group.PGroup shufflerPGroup

java.util.List<gngs.pair.Shuffler> shufflers

boolean throttleWarning

Constructor Detail

PairScanner(java.io.Writer writer1, java.io.Writer writer2, int numLocators, Regions regions, java.lang.String filterExpr, int writerQueueSize)

PairScanner(java.io.Writer writer, int numLocators, Regions regions, java.lang.String filterExpr, int writerQueueSize)

Method Detail

@groovy.transform.CompileStatic void createLocator(SAM bam, java.util.Set<java.lang.Integer> sequencesWithReads, gngs.pair.Shuffler shuffler)

java.util.Set<java.lang.Integer> getContigsWithReads(SAM bam)

void initLocators(SAM bam)

@groovy.transform.CompileStatic void scan(SAM bam)

void stopActor(java.lang.String name, groovyx.gpars.actor.Actor actor)