class SAM extends java.lang.Object
Adds various Groovy idioms and convenience features to the Picard SamReader.
There are three major classes of functionality supported:
For simple looping, the eachRead(Closure) static method can be used without creating a SAM object at all:
SAM.eachRead { SAMRecord r -> println r.readName }(this will read from standard input). More sophisticated use requires the construction of a SAM object, which allows, for example, iteration of read pairs:
new SAM("test.bam").eachPair { r1, r2 -> assert r1.readName == r2.readName }
A region can be optionally passed to iterate over:
new SAM("test.bam").eachPair("chr1",1000,2000) { r1, r2 -> assert r1.readName == r2.readName }Filtering a BAM file to create file containing a subset of reads is supported explicitly:
new SAM("test.bam").filter("out.bam") { it.mappingQuality > 30 }Generating pileups is also straightforward:
new SAM("test.bam").pileup("chr1",1000,2000) { p -> println "There are ${p.countOf('A')} A bases at position chr1:$p.position" }See the Pileup class for more information about operations on pileups.
Notes:
Type | Name and description |
---|---|
java.io.File |
indexFile |
int |
minMappingQuality |
static boolean |
progress |
java.io.File |
samFile |
htsjdk.samtools.SamReader |
samFileReader |
java.io.InputStream |
samStream Only used when created from stream |
java.util.List<java.lang.String> |
samples List of samples in the BAM file. |
boolean |
useMemoryMapping |
boolean |
verbose |
Type Params | Return Type | Name and description |
---|---|---|
|
Regions |
asType(java.lang.Class clazz) |
|
java.util.Map<java.lang.String, java.lang.Integer> |
basesAt(java.lang.String chr, int pos) Return a Map with a key for each base observed at the given position, with the value being the number of times that base was observed. |
|
void |
close() Close the underlying SamReader |
|
int |
countOf(java.lang.String chr, int pos, java.lang.String baseString) |
|
java.util.Map<java.lang.String, java.lang.Long> |
countOnTarget(Regions targets) Return a set of read counts indicating the counts of reads in that overlap the target region. |
|
int |
countPairs(gngs.Region r) Count the number of read pairs in the given region |
|
int |
countPairs(java.lang.String chr, int start, int end) Count the number of read pairs in the given region |
|
int |
coverage(java.lang.String chr, int pos, groovy.lang.Closure c) |
|
int |
coverage(java.lang.String chr, int pos, int end, groovy.lang.Closure c) |
|
static int |
coverage(htsjdk.samtools.SamReader r, java.lang.String chr, int pos, int end, groovy.lang.Closure c, int minMappingQuality) Return the number of mapped reads overlapping the given position |
|
RegulatingActor |
coverageAsync(gngs.Region region, int minMAPQ, groovy.lang.Closure c) |
|
RegulatingActor |
coverageAsync(Regions regions, int minMAPQ, groovy.lang.Closure c) |
|
void |
coverageAsync(gngs.Region region, RegulatingActor sink, int minMAPQ) Stream coverage values as SampleReadCount objects to the given actor from the given region(s). |
|
int |
coverageAsync(Regions regions, RegulatingActor sink, int minMAPQ) |
|
CoverageStats |
coverageStatistics(IRegion r) Create a CoverageStats object for the depth of coverage over the given region |
|
CoverageStats |
coverageStatistics(java.lang.String chr, int pos, int end) Create a CoverageStats object for the depth of coverage over the given region |
|
CoverageStats |
coverageStatistics(Regions regions) |
|
void |
eachPair(groovy.lang.Closure c) Call the given closure for every pair of reads in a BAM file containing paired end reads. |
|
void |
eachPair(java.util.Map options, groovy.lang.Closure c) Call the given closure for every pair of reads in a BAM file containing paired end reads. |
|
void |
eachPair(java.util.Map options, htsjdk.samtools.SAMRecordIterator iter, groovy.lang.Closure c) Call the given closure for every pair of reads in a BAM file containing paired end reads. |
|
void |
eachPair(gngs.Region r, groovy.lang.Closure c) |
|
void |
eachPair(java.lang.String chr, groovy.lang.Closure c) |
|
void |
eachPair(java.util.Map options, java.lang.String chr, int start, int end, groovy.lang.Closure c) |
|
static void |
eachRead(groovy.lang.Closure c) Read a BAM or SAM file from standard input and call the given closure for each read contained therein. |
|
void |
eachRecord(java.util.Map options, groovy.lang.Closure c) Iterate over each record in the same file in the order they are in the file |
|
void |
eachRecord(RegulatingActor<SAMRecord> actor) Iterate over each record in the same file in the order they are in the file |
|
void |
eachRecord(gngs.Region region, RegulatingActor<java.util.List<SAMRecord>> actor) Iterate over each record in the same file in the order they are in the file |
|
void |
eachRecord(int threads, groovy.lang.Closure c) Call the given closure for each read in this alignment, using the given number of threads eg: |
|
void |
filter(groovy.lang.Closure c) Filter the SAM/BAM file to include only the reads for which the given closure returns true. |
|
void |
filter(java.util.Map options, java.lang.String outputFile, groovy.lang.Closure c) Filter the SAM/BAM file to include only the reads for which the given closure returns true |
|
void |
filterOrderedPairs(java.util.Map options, java.lang.String outputFileName, groovy.lang.Closure c) Execute the given closure with an actor (parallel thread) set to write ordered pairs to the given output file, based on this BAM file. |
|
java.lang.String |
genotype(java.lang.String chr, int pos) Use a simple thresholding approach to genotype SNPs at the given location |
|
java.util.List<java.lang.String> |
getContigList()
|
|
Regions |
getContigRegions() Return the contigs of this BAM file as a set of regions |
|
java.util.Map<java.lang.String, java.lang.Integer> |
getContigs()
|
|
java.util.List<SAMReadGroupRecord> |
getReadGroups()
|
|
java.util.List<java.lang.String> |
getSamples() |
|
static void |
index(java.io.File bamFile) Create an index for the given BAM file |
|
float |
meanCoverage(java.lang.String chr, int pos, int end) |
|
void |
movingWindow(int windowSize, java.lang.String chr, groovy.lang.Closure c, groovy.lang.Closure filterFn) Call the given closure for each base position with a moving window of reads over that position |
|
void |
movingWindow(int windowSize, java.lang.String chr, int start, int end, groovy.lang.Closure c, groovy.lang.Closure filterFn) Call the given closure for each base position with a moving window of reads over that position |
|
htsjdk.samtools.SamReader |
newReader(java.util.Map options) |
|
htsjdk.samtools.SAMFileWriter |
newWriter(java.lang.String outputFileName) Return a new SAMFileWriter configured with the same settings as this SAM. |
|
Pileup |
pileup(java.lang.String chr, int pos) |
|
void |
pileup(java.lang.String chr, int start, int end, groovy.lang.Closure c) Call the given closure once for each base between the start and end positions with a Pileup object representing the pileup state at that position. |
|
PileupIterator |
pileup(java.lang.String chr, int start, int end) Create and return an iterator that iterates over Pileup objects over the given range. |
|
PileupIterator |
pileup(htsjdk.samtools.SamReader reader, java.lang.String chr, int start, int end) |
|
htsjdk.samtools.SAMRecord |
queryMate(htsjdk.samtools.SamReader r, htsjdk.samtools.SAMRecord r1) |
|
int |
size() Count the total number of reads in the SAM file |
|
java.lang.String |
sniffGenomeBuild() Probe the given BAM file to make a guess about what genome build it is generated from. |
|
Regions |
toPairRegions(java.lang.String chr, int start, int end, int maxSize) Return read pairs from this SAM file that overlap the specified region as a Regions object - that is, as a set of genomic intervals. |
|
java.util.List<QueryInterval> |
toQueryIntervals(Regions regions) |
|
Regions |
toRegions(gngs.Region overlapping) |
|
Regions |
toRegions(java.lang.String chr, int start, int end) Return reads from this SAM file that overlap the specified region as a Regions object - that is, as a set of genomic intervals. |
|
java.lang.Object |
withIterator(gngs.Region region, groovy.lang.Closure c) |
|
java.lang.Object |
withIterator(groovy.lang.Closure c) |
|
java.lang.Object |
withOrderedPairActor(java.util.Map options, java.lang.String outputFileName, groovy.lang.Closure c) Execute the given closure with an actor set to write ordered pairs to the given output file, based on this BAM file. |
|
java.lang.Object |
withOrderedPairWriter(java.util.Map options, java.lang.String outputFileName, boolean sorted, groovy.lang.Closure c) |
|
java.lang.Object |
withReader(groovy.lang.Closure c) |
|
java.lang.Object |
withWriter(java.lang.String outputFileName, groovy.lang.Closure c) Return a new SAMFileWriter configured with the same settings as this SAM. |
|
java.lang.Object |
withWriter(java.lang.String outputFileName, boolean sorted, groovy.lang.Closure c) |
Methods inherited from class | Name |
---|---|
class java.lang.Object |
java.lang.Object#wait(long), java.lang.Object#wait(long, int), java.lang.Object#wait(), java.lang.Object#equals(java.lang.Object), java.lang.Object#toString(), java.lang.Object#hashCode(), java.lang.Object#getClass(), java.lang.Object#notify(), java.lang.Object#notifyAll() |
Only used when created from stream
List of samples in the BAM file. Note: this list can be overridden to associate different samples to the BAM files to those in the BAM header.
Return a Map with a key for each base observed at the given position, with the value being the number of times that base was observed. Additionally, a keys for deletions ('deletion') and total bases ('total') are set. @return
Close the underlying SamReader
Return a set of read counts indicating the counts of reads in that overlap the target region.
Count the number of read pairs in the given region
r
- region to count pairs overCount the number of read pairs in the given region
chr
- chromsome of regionstart
- start of regionend
- end of regionReturn the number of mapped reads overlapping the given position
r
- the SamReader (SAM / BAM file) containing readschr
- the sequence name / chromosome to querypos
- the chromosomal position to queryc
- a filter to include or reject readsStream coverage values as SampleReadCount objects to the given actor from the given region(s).
Create a CoverageStats object for the depth of coverage over the given region
Create a CoverageStats object for the depth of coverage over the given region
Call the given closure for every pair of reads in a BAM file containing paired end reads.
Note: the algorithm works by keeping a running buffer of reads, and iterating through the reads in order until each single read finds its mate. This means that reads having no mate accumulate in the buffer without ever being removed. Thus a large BAM file containing millions of unpaired reads could cause this method to use substantial ammounts of memory.
c
- Closure to callCall the given closure for every pair of reads in a BAM file containing paired end reads.
Note: the algorithm works by keeping a running buffer of reads, and iterating through the reads in order until each single read finds its mate. This means that reads having no mate accumulate in the buffer without ever being removed. Thus a large BAM file containing millions of unpaired reads could cause this method to use substantial ammounts of memory.
c
- Closure to callCall the given closure for every pair of reads in a BAM file containing paired end reads.
Note: the algorithm works by keeping a running buffer of reads, and iterating through the reads in order until each single read finds its mate. This means that reads having no mate accumulate in the buffer without ever being removed. Thus a large BAM file containing millions of unpaired reads could cause this method to use substantial ammounts of memory.
iter
- iterator to consumer reads fromc
- Closure to callRead a BAM or SAM file from standard input and call the given closure for each read contained therein.
Iterate over each record in the same file in the order they are in the file
Iterate over each record in the same file in the order they are in the file
Iterate over each record in the same file in the order they are in the file
Call the given closure for each read in this alignment, using the
given number of threads eg:
SAM sam = new SAM("test.bam")
sam.eachRecord(5) { r -> println(r.readName) }
threads
- number of threads to usec
- closure to callFilter the SAM/BAM file to include only the reads for which the given closure returns true. Output is written to stdout.
c
- Closure that should return true for reads that will be
included in the output BAM fileFilter the SAM/BAM file to include only the reads for which the given closure returns true
outputFile
- the path to the output filec
- Closure that should return true for reads that will be
included in the output BAM fileExecute the given closure with an actor (parallel thread) set to write ordered pairs to the given output file, based on this BAM file.
If the closure returns a SAMRecordPair then the pair is written. If any other value is returned, the value is evaluated as a boolean and the original read pair is written if the boolean is true.
Use a simple thresholding approach to genotype SNPs at the given location @return
Return the contigs of this BAM file as a set of regions @return
Create an index for the given BAM file
Call the given closure for each base position with a moving window of reads over that position
NOTE: Requires a BAM file sorted by position. Will not work with unsorted bam file.
Call the given closure for each base position with a moving window of reads over that position
NOTE: Requires a BAM file sorted by position. Will not work with unsorted bam file.
windowSize
- the size of window to use in bpstart
- start positionend
- end positionc
- callback function to invokefilterFn
- optional filter function to apply that filters out readsReturn a new SAMFileWriter configured with the same settings as this SAM. It is the caller's responsibility to close the writer.
outputFileName
- Name of file to write toCall the given closure once for each base between the start and end positions with a Pileup object representing the pileup state at that position.
Create and return an iterator that iterates over Pileup objects over the given range.
chr
- Chromsome of range to iterate overstart
- Start of rangeend
- End of rangeCount the total number of reads in the SAM file
Probe the given BAM file to make a guess about what genome build it is generated from. @return
Return read pairs from this SAM file that overlap the specified region as a Regions object - that is, as a set of genomic intervals. Each region returned by this method spans from the start of the 5' read to the end of the 3' read.
Reads that are missing a start or end alignment position are omitted. Unpaired reads are also omitted.
If called without passing start or end, the start / end are interpreted as the beginning / end of the chromosome / reference sequence respectively.
Return reads from this SAM file that overlap the specified region as a Regions object - that is, as a set of genomic intervals.
Reads that are missing a start or end alignment position are omitted.
If called without passing start or end, the start / end are interpreted as the beginning / end of the chromosome / reference sequence respectively.
Execute the given closure with an actor set to write ordered pairs to the given output file, based on this BAM file.
Return a new SAMFileWriter configured with the same settings as this SAM. It is the caller's responsibility to close the writer. In this version of the method the files are assumed to be pre-sorted*.
outputFileName
- Name of file to write toGroovy Documentation