class RangeIndex extends java.lang.Object implements java.lang.Iterable<groovy.lang.IntRange>
Indexes a set of ranges so that the ranges crossing any point can be identified quickly and easily.
The RangeIndex models ranges by their breakpoints. Each time a new range is inserted, the breakpoints are stored in a tree and the list of overlapping ranges at each breakpoint is tracked for each breakpoint entry. This makes it easy to find overlapping ranges at any point. This class is the primary driver for the BED class which adds support for contigs so that a whole reference sequence containing multiple contigs (eg: chromosomes) can be indexed.
The RangeIndex is built upon Groovy's built in groovy.lang.IntRange class. An important aspect of this class is that by default its end point is included in the range, while by default BED ranges are exclusive of the end point. Thus care needs to be taken in translating between the two. When you add a range to the index it is treated as a BED range: that is, the end point is exclusive. When you retrieve it, the end position will be inclusive ie: it will be decremented by one from what you inserted.
RangeIndex does not de-duplicate identical ranges. If you put in multiple identical ranges they will be separate stored and separately returned as outputs for queries and iteration over the index.
Usage
Adding ranges is done via the add
method:
RangeIndex index = new RangeIndex() index.add(5..10) index.add(20..30) index.add(25..35)The RangeIndex class implements the java.lang.Iterable interface, allowing generic Groovy collections methods to work:
index.each { println "A range from $it.from-$it.to" }
assert index.grep { it.from > 15 } == 2
Design notes: RangeIndex stores intervals as a TreeMap (a balanced tree), indexed
by breakpoints. Thus there is an entry in the index for every "breakpoint" (start
and end) of every range in the index. The value at each breakpoint is a list of the
ranges that overlap the breakpoint. For example, if a range
a..b is added (inclusive of both a and b), an entry at a
is created and
the list at index a
will contain the range a..b
. Another
entry at b+1
will be created, which does not contain a
.
(If there are no other overlapping ranges, the entry at b+1
will be
empty).
To minimise memory use, RangeIndex accepts plain IntRange objects into the index. However
frequently it is desirable to associate a range to some more data. For this purpose,
the GRange class extends IntRange but contains an extra
field that
can be used to store arbitrary data in a range object.
Modifiers | Name | Description |
---|---|---|
class |
RangeIndex.1 |
|
class |
RangeIndex.2 |
|
class |
RangeIndex.3 |
|
class |
RangeIndex.4 |
Type | Name and description |
---|---|
static java.util.Comparator<groovy.lang.IntRange> |
INT_RANGE_COMPARATOR |
java.util.TreeMap<java.lang.Integer, java.util.List<groovy.lang.IntRange>> |
ranges The actual index - a map from position to the list of ranges covering the region after the position. |
Constructor and description |
---|
RangeIndex
() |
Type Params | Return Type | Name and description |
---|---|---|
|
void |
add(int startPosition, int endPosition, java.lang.Object extra) Add the given range to this index |
|
void |
add(groovy.lang.IntRange newRange) |
|
void |
addContainedEntry(Map.Entry<java.lang.Integer, java.util.List<groovy.lang.IntRange>> containedEntry, Map.Entry<java.lang.Integer, java.util.List<groovy.lang.IntRange>> higherEntry, groovy.lang.IntRange newRange) Add a reference to a range to a breakpoint that falls in the middle of the range |
|
void |
checkRanges(int position) Check that all the ranges added at a position span that position |
|
java.util.List<gngs.GRange> |
coverage() Return a list of ranges representing "coverage" blocks within this index. |
|
void |
debugdump() |
|
int |
distanceTo(int pos) Returns the distance from the nearest range to the given position |
|
int |
distanceTo(groovy.lang.IntRange r) |
|
void |
dump() |
|
java.util.List<groovy.lang.IntRange> |
endingAt(int pos) |
|
groovy.lang.Range |
first() |
|
java.util.Iterator<gngs.GRange> |
getBoundaries() Returns a range for each boundary within the index. |
|
int |
getNumRanges() |
|
java.util.List<groovy.lang.Range> |
getOverlaps(int position) |
|
java.util.List<groovy.lang.IntRange> |
getOverlaps(int start, int end) Return a list of ranges stored in the index that overlap the specified range. |
|
java.util.List<groovy.lang.Range> |
getOverlaps(int start, int end, boolean returnFirst) Return a list of the ranges in this index that overlap the given range. |
|
java.util.List<groovy.lang.IntRange> |
intersect(int start, int end) Return a list of ranges that intersect the given start and end points. |
|
boolean |
isCase(int position) Support for 'in' operator |
|
java.util.Iterator<groovy.lang.IntRange> |
iterator() |
|
java.util.Iterator<groovy.lang.IntRange> |
iteratorAt(int startingPos) Return an iterator that will return each range in the index in genomic order by starting position, starting from the given position. |
|
groovy.lang.Range |
last() |
|
groovy.lang.IntRange |
nearest(int pos) |
|
groovy.lang.Range |
nextRange(int pos) |
|
boolean |
overlaps(int start, int end) Return true if the given range overlaps at least one range in this RangeIndex. |
|
groovy.lang.Range |
previousRange(int pos) |
|
RangeIndex |
reduce(groovy.lang.Closure reducer) Merge all overlapping ranges together to make simplified regions representing all the ranges covered by any range in this RangeIndex. |
|
void |
remove(groovy.lang.Range r) Remove an existing range from the index. |
|
java.util.Iterator<groovy.lang.IntRange> |
reverseIterator() |
|
java.util.Iterator<groovy.lang.IntRange> |
reverseIteratorAt(int startingPos) Return an iterator that will return each range that starts at or before the given position, proceeding backwards through the genome, ordered by start position. |
|
java.util.List<groovy.lang.IntRange> |
startingAt(int pos) |
|
java.util.List<groovy.lang.IntRange> |
subtractFrom(int start, int end) Subtract all the ranges in this range index from the given range and return the resulting list of ranges. |
Methods inherited from class | Name |
---|---|
class java.lang.Object |
java.lang.Object#wait(long), java.lang.Object#wait(long, int), java.lang.Object#wait(), java.lang.Object#equals(java.lang.Object), java.lang.Object#toString(), java.lang.Object#hashCode(), java.lang.Object#getClass(), java.lang.Object#notify(), java.lang.Object#notifyAll() |
The actual index - a map from position to the list of ranges covering the region after the position. There is an entry in this index for each position where a range starts or ends.
Add the given range to this index
If the 'extra' argument is specified then a GRange will be added with the extra
as its extra data argument. Otherwise, a plain IntRange will be added.
Add a reference to a range to a breakpoint that falls in the middle of the range
containedEntry
- the entry for the breakpoint that is contained by the rangehigherEntry
- the entry for the next breakpoint. null
if there is no higher entry.newRange
- the range to add to the breakpoint entryCheck that all the ranges added at a position span that position
Return a list of ranges representing "coverage" blocks within this index. That is, for each position where the number of overlapping ranges changes, a separate range is returned with a count of the overlaps as the 'extra' field.
Returns the distance from the nearest range to the given position @return
Returns a range for each boundary within the index. @return
Return a list of ranges stored in the index that overlap the specified range. Both ends of the range are *inclusive*.
start
- first position to look for overlapsend
- last position to look for overlaps (inclusive)Return a list of the ranges in this index that overlap the given range. If there are multiple ranges that overlap, or multiple distinct occurrences of the same range that were added to the index, then these will be returned separately.
start
- start of query intervalend
- end of query intervalreturnFirst
- if true, return only the first overlap foundReturn a list of ranges that intersect the given start and end points.
Note:The start and end point are both treated as inclusive.
start
- start of range to find intersections forend
- end of range to find intersections forSupport for 'in' operator
Return an iterator that will return each range in the index in genomic order by starting position, starting from the given position.
Return true if the given range overlaps at least one range in this RangeIndex.
start
- start of range to test (inclusive)end
- end point of range to test (inclusive) Merge all overlapping ranges together to make simplified regions
representing all the ranges covered by any range in this RangeIndex.
If a closure is provided, it is called whenever two regions need to be merged,
providing both regions as arguments. The return value is set as the extra
attribute on the resulting region.
Remove an existing range from the index. The existing range MUST already be in the index, and cannot be a user-supplied range.
Return an iterator that will return each range that starts at or before the given position, proceeding backwards through the genome, ordered by start position.
Subtract all the ranges in this range index from the given range and return the resulting list of ranges.
Note:both start and end are considered inclusive.
Groovy Documentation