class Variant extends java.lang.Object
Represents the genetic state at a specific locus in a genome ie: captures the information at a single line of a VCF file. Note that this can include multiple alleles that may be present at the site, having different start and end position.
The Variant class includes support for accessing the structured annotations and attributes included in the VCF format and popular annotation tools such as VEP, Annovar and SnpEFF. See getSnpEffInfo() and getVepInfo() for accessors that return parsed annotation information.
Important:The Variant class represents a line in a VCF file and therefore potentially multiple alleles and samples and corresponding annotations, genotypes, etc. However it is very common to work with normalised single-sample VCFs that guarantee only a single allele per site and represent just one sample. To make operations on variants more convenient, some methods have both a sample/allele specific and 'default allele' version. The default allele version will operate on the first allele listed in the alleles for the first sample. These are convenient to use, but you should always keep in mind they ignore any other alleles and samples and could be unsafe to use on non-normalised VCFs.
Although Variant instances can be constructed manually, typically you will query them from VCF or VCFIndex classes, and this is required for some operations because the VCF header is required to parse and understand some of the fields (for example, sample names). Without a header many operations still work, but you may experience java.lang.IllegalStateException exceptions if you call methods that depend on the header.
One of the most useful operations is to determine the dosage (ie: heterozygosity, number of copies) of a variant in a particular sample. Eg:
VCF vcf = VCF.parse("test.vcf")
// Find all the variants that affect sample MYSAMPLE in any way
List mySampleVariants = vcf.grep { Variant v -> v.sampleDosage("MYSAMPLE")>0 }
// Find all the homozygous variants (for autosomal + female X chromosomes)
List mySampleVariants = vcf.grep { Variant v -> v.sampleDosage("MYSAMPLE") == 2 }
...
Some support is implemented for understanding pedigrees via the Pedigree class.
If a pedigree is set on the VCF class from which the Variant originates,
then one can query variants to find variants that segregate with families or conditions.
A Variant also implements the IRange interface and thus you can use them seamlessly with a RegionSource. For example, to check if a Variant falls within a BED file works simply:
Variant v = ...
BED bed = new BED("test.bed").load()
if(v in bed)
println "Variant v falls in the ranges included in the BED file"
The Variant class supports working with the full genotype data of each sample, and
the genotype fields are parsed for you. For example, to check if every sample is below
a particular genotype quality:
if(v.genoTypes.every { it.GQ < 5.0f })
println "Quality is low for every sample!"
| Modifiers | Name | Description |
|---|---|---|
class |
Variant.Allele |
A specific allele in the context of a Variant in a VCF file |
| Type | Name and description |
|---|---|
static java.util.regex.Pattern |
AMPERSAND_SPLIT |
static java.util.regex.Pattern |
COLON_SPLIT |
static java.util.regex.Pattern |
COMMA_SPLIT |
static java.util.regex.Pattern |
PIPE_OR_SLASH_SPLIT |
static java.util.regex.Pattern |
PIPE_SPLIT |
static java.text.NumberFormat |
QUAL_FORMATTER |
static java.util.regex.Pattern |
TAB_SPLIT |
java.lang.String |
altThe sequence for the first alternate allele. |
byte |
altByteThe first character of the first alternate allele in byte form. |
java.lang.String[] |
altsList of alternate alleles represented by the variant. |
Region |
cachedRegion |
java.lang.Integer |
cachedSize |
java.lang.String |
chrThe chromosome on which the variant falls |
java.util.List<java.lang.Integer> |
dosagesCached set of dosages for the first allele. |
java.lang.String |
filterThe filter field as loaded from the VCF file. |
java.util.List |
genoTypeFields |
java.util.List<java.util.Map<java.lang.String, java.lang.Object>> |
genoTypes |
VCF |
headerThe VCF from which header information will be extracted when required. |
java.lang.String |
idID column as per VCF spec. |
java.lang.String |
infoThe whole INFO field as a raw string. |
java.util.Map<java.lang.String, java.lang.Object> |
infos |
java.lang.String |
lineThe original line from which this variant was parsed |
static Set |
numericGTFields |
static Set |
numericListFields |
java.util.Set<Pedigree> |
pedigrees |
int |
posThe position at which the reference bases (ref attribute) starts. |
float |
qualThe QUAL field as loaded from the VCF file |
java.lang.String |
refThe reference allele at the position. |
boolean |
snpEffDirty |
java.util.List<SnpEffInfo> |
snpEffInfo |
java.lang.String |
typeOne of SNP, INS, DEL or SV indicating the type of change represented by the first alternate allele. |
| Type Params | Return Type | Name and description |
|---|---|---|
|
java.lang.String |
applyDEL(java.lang.String sequence, int position)Apply this deletion to the specified sequence at the specified position |
|
java.lang.String |
applyINS(java.lang.String sequence, int position)Apply this deletion to the specified sequence at the specified position |
|
java.lang.String |
applySNP(java.lang.String sequence, int position) |
|
java.lang.String |
applyTo(java.lang.String sequence, int position)Apply this variant to mutate the given sequence at the specified position |
|
java.lang.Object |
asType(java.lang.Class clazz) |
|
java.lang.Number |
convertNumericValue(java.lang.String value, java.lang.Number defaultValue = 0I) |
|
java.lang.String |
convertType(java.lang.String refSeq, java.lang.String altSeq)Convert reference and alternate allele strings into a mutation type, being one of "SNP","INS","DEL","GAIN" or "LOSS", with the latter two representing CNVs. |
|
java.lang.String |
displaySequenceDifference(java.lang.String sequence, int position)A debug function, displays the sequence difference and returns the new sequence |
|
int |
equalsAnnovar(java.lang.String chr, int pos, java.lang.String obs)Return true if the given Annovar formattd position and observation match those of this variant |
|
int |
findAlleleIndex(Variant.Allele other)Return the index of the allele (if any) that matches the give other variant |
|
void |
fixGenoTypeOrder()VCF requires genotype to be the 1st field, but some tools (R, grrrr) write it in whatever order they feel like. |
|
float |
getAlleleBalance()Return the distance of the balance of the default allele from 0.5 for the default alternate allele and first sample |
|
float |
getAlleleBalance(int allele1, int allele2, int sampleIndex = 0)Return the distance of the balance of the default allele from 0.5 for the specififed alternate allele (reference = 0) |
|
java.util.List<java.lang.Integer> |
getAlleleDepths(int alleleIndex)Return the number of reads supporting the given allele as a list with one entry for each sample in the VCF |
|
java.util.List<Allele> |
getAlleles()Return a list of Allele objects representing alleles present on this line of the VCF |
|
java.util.List<java.util.List<java.lang.String>> |
getAllelesAndTypes()Return a list of each allele and its type. |
|
int |
getAltDepth()Return the depth of the first alternate allele for the first sample @return |
|
java.lang.String |
getConsequence(int alleleIndex)Return the consequence of the specified allele, at the moment, from VEP annotations (later, from others). |
|
int |
getDosage()Return list of dosages (number of copies of allele) for each sample for the first alternate allele. |
|
java.util.List<java.lang.Integer> |
getDosages()Return list of dosages (number of copies of allele) for each sample for the first alternate allele. |
|
java.util.List<java.lang.Integer> |
getDosages(int alleleIndex)Return the number of copies of the given alternate allele for each sample in the VCF |
|
java.util.List<java.lang.String> |
getGenes(java.lang.String minVEPCons)Return the list of genes impacted by this variant, in a manner that is neutral to the annotator used (VEP and SnpEFF supported) |
|
int |
getGenotypeDepth(java.util.Map<java.lang.String, java.lang.Object> gt, int alleleIndex) |
|
java.util.Map<java.lang.String, java.lang.Object> |
getInfo() |
|
float |
getMaxAlleleBalance()Return the maximum allele balance for any sample for the default alternate allele. |
|
SnpEffInfo |
getMaxEffect()Return the most impactful SnpEff effect for this variant |
|
java.util.Map<java.lang.String, java.lang.Object> |
getMaxVep()Return the details of the most severe VEP consequence, as ranked by VEPConsequences.RANKED_CONSEQUENCES. |
|
java.lang.String |
getMaxVepImpact() |
|
float |
getMaxVepMaf() |
|
java.util.Set<Pedigree> |
getPedigrees()Return a list of all the pedigrees that contain this variant |
|
groovy.lang.IntRange |
getRange() |
|
Region |
getRegion() |
|
java.util.List<SnpEffInfo> |
getSnpEffInfo()Return a list of SnpEffInfo objects, each describing a separate SnpEff effect caused by the variant @return |
|
java.lang.Integer |
getTotalDepth() |
|
java.lang.Integer |
getTotalDepth(java.lang.String sample) |
|
java.lang.Integer |
getTotalDepth(int sampleIndex) |
|
float |
getVaf()Return the fraction of reads supporting the first alternate alelle for the first sample in the VCF |
|
float |
getVaf(int alleleIndex, int sampleIndex = 0)Return the fraction of reads supporting the given allele for the specified sample |
|
java.util.List<java.util.Map<java.lang.String, java.lang.Object>> |
getVepInfo() |
|
long |
getXpos() |
|
java.lang.String |
igv() |
|
boolean |
isCase(IRegion r)Return true if this variant overlaps the given range |
|
boolean |
isHet()Note: this returns true if any sample in the VCF is het, for multisample VCFs check the dosage of the specific sample directly. |
|
boolean |
isHom()@return true if at least 1 allele is present with 2 copies |
|
boolean |
isSV() |
|
static Variant |
parse(java.lang.String line) |
|
static Variant |
parse(java.lang.String line, boolean ignoreNonRef) |
|
java.util.Map<java.lang.String, java.lang.Object> |
parseGenoTypeFields(java.lang.String gt) |
|
int |
sampleDosage(java.lang.String sampleName) |
|
int |
sampleDosage(java.lang.String sampleName, int alleleIndex) |
|
Map |
sampleGenoType(java.lang.String sampleName) |
|
boolean |
segregatesWith(Pedigree ped)Return a Map of fields from the genotype column corresponding to the given sample. |
|
void |
setAlt(java.lang.String alt)Update the first alternate allele to the given value |
|
int |
size()Returns the change in size of the genome caused by this variant's default allele. |
|
Map |
toAnnovar(int alleleIndex = 0) |
|
java.lang.String |
toJson(java.lang.String sample = null)Return a JSON string representing the key details about this variant. |
|
java.lang.String |
toString() |
|
void |
update(Closure c)Update using a default description. |
|
void |
update(java.lang.String desc, Closure c)Allows various fields to be updated and then synchronises the rest of the data with those updated fields |
| Methods inherited from class | Name |
|---|---|
class java.lang.Object |
java.lang.Object#wait(long, int), java.lang.Object#wait(long), java.lang.Object#wait(), java.lang.Object#equals(java.lang.Object), java.lang.Object#toString(), java.lang.Object#hashCode(), java.lang.Object#getClass(), java.lang.Object#notify(), java.lang.Object#notifyAll() |
The sequence for the first alternate allele. This is a convenience for the common case where the first alternate allele is the only one of interest. In general you should be aware that more than one alternate Allele represented by a single Variant.
The first character of the first alternate allele in byte form. This is a special case optimization to allow for fast access when interoperating with Picard (which wants to see bytes, not char or Strings).
List of alternate alleles represented by the variant.
The chromosome on which the variant falls
Cached set of dosages for the first allele.
The filter field as loaded from the VCF file.
The VCF from which header information will be extracted when required. This enables the variant to intelligently parse INFO fields and be aware of sample names. If the header is not provided, many functions still work, but some functions will be disabled.
ID column as per VCF spec. Usually this contains the rsID (dbSNP) identifer
The whole INFO field as a raw string. Users will generally call getInfo() to get this field in parsed form rather than accessing this field directly.
The original line from which this variant was parsed
The position at which the reference bases (ref attribute) starts. Note that this is not necessarily the start of the genetic change represented by the variant.
The QUAL field as loaded from the VCF file
The reference allele at the position. Note that as per VCF spec, this is not necessarily the start of the change indicated by the Variant. For the start of actual changes, you need to query the Allele objects and look at the Allele#start#start and Allele#end#end attributes.
One of SNP, INS, DEL or SV indicating the type of change represented by the first alternate allele.
Apply this deletion to the specified sequence at the specified position
NOTE: this variant MUST be a deletion
Apply this deletion to the specified sequence at the specified position
NOTE: this variant MUST be a deletion
Apply this variant to mutate the given sequence at the specified position
Convert reference and alternate allele strings into a mutation type, being one of "SNP","INS","DEL","GAIN" or "LOSS", with the latter two representing CNVs.
A debug function, displays the sequence difference and returns the new sequence
Return true if the given Annovar formattd position and observation match those of this variant
Annovar outputs non-standard formatting that makes it difficult to trace an Annovar variant back to it's VCF source. This method implements the tricky logic to compare an Annovar variant to a VCF equivalent and say if they are the same.
Return the index of the allele (if any) that matches the give other variant
VCF requires genotype to be the 1st field, but some tools (R, grrrr) write it in whatever order they feel like. This function corrects it.
Return the distance of the balance of the default allele from 0.5 for the default alternate allele and first sample
Return the distance of the balance of the default allele from 0.5 for the specififed alternate allele (reference = 0)
Return the number of reads supporting the given allele as a list with one entry for each sample in the VCF
alleleIndex - index of allele, reference = 0Return a list of Allele objects representing alleles present on this line of the VCF
Note: currently this computes the results on the fly and they are not cached, so use with caution in computationaly intensive situations
Return a list of each allele and its type. Yes, VCF allows multiple types (INS,DEL) to be on the same line of a VCF file.
Return the depth of the first alternate allele for the first sample
Return the consequence of the specified allele, at the moment, from VEP annotations (later, from others). If multiple consequences are present for the same allele, then the most severe consequence is returned.
Return list of dosages (number of copies of allele) for each sample for the first alternate allele.
Return list of dosages (number of copies of allele) for each sample for the first alternate allele.
Return the number of copies of the given alternate allele for each sample in the VCF Note: the first alternate allele is 0.
Return the list of genes impacted by this variant, in a manner that is neutral to the annotator used (VEP and SnpEFF supported)
minVEPCons - the VEP impact level above which genes should
be returnedReturn the maximum allele balance for any sample for the default alternate allele.
Return the most impactful SnpEff effect for this variant
Return the details of the most severe VEP consequence, as ranked by VEPConsequences.RANKED_CONSEQUENCES.
Return a list of all the pedigrees that contain this variant
Return a list of SnpEffInfo objects, each describing a separate SnpEff effect caused by the variant
Return the fraction of reads supporting the first alternate alelle for the first sample in the VCF
Return the fraction of reads supporting the given allele for the specified sample
The 0th allele is the reference, so typically you would want to use 1 or more.
alleleIndex - index of the allele to return the fracsampleIndex - index of sample (in order of VCF header)Return true if this variant overlaps the given range
Note: this returns true if any sample in the VCF is het, for multisample VCFs check the dosage of the specific sample directly.
Return a Map of fields from the genotype column corresponding to the given sample. Common fields include:
Update the first alternate allele to the given value
alt - Alternate sequence of bases. Note that this should conform to VCF
spec, eg: for deletions it is expected at least one base of context
should be provided, and this should be consistent with the pos
field.Returns the change in size of the genome caused by this variant's default allele.
Note: since point mutations / SNVs don't change the size, they have size zero
Return a JSON string representing the key details about this variant. Note:
Update using a default description.
Call this method to prepare the Variant for updating and then make your updates within the Closure that you pass. After the closure exits, other internal fields that are impacted by your changes will be synchronized with the changes you made.
Not all fields are supported! see update(String, Closure)
It's generally bad form to add INFO fields without annotating what you've done. Prefer to use update(String, Closure) and pass in a description over this method.
c - Closure within updates can be made.Allows various fields to be updated and then synchronises the rest of the data with those updated fields
Call this method to prepare the Variant for updating and then make your updates within the Closure that you pass. After the closure exits, other internal fields that are impacted by your changes will be synchronized with the changes you made.
Not all fields are supported! see the ones that are set below.
The only update to snpEFF information is to remove individual annotations.