Haplotype phasing of genetic variants is important for clinical interpretation of the genome, population genetic analysis, and functional genomic analysis of allelic activity.
Here researchers from Columbia University present phASER, a fast and accurate approach for phasing variants that are overlapped by sequencing reads, including those from RNA-sequencing (RNA-seq), which often span multiple exons due to splicing.
- dramatically more accurate phasing of rare and de novo variants compared to population-based phasing;
- phasing of variants in the same gene up to hundreds of kilobases away which cannot be obtained from DNA-sequencing reads;
- high confidence measures of haplotypic expression, greatly improving power for allelic expression studies.
Read backed haplotype phasing that incorporates RNA-seq using phASER.
A) phASER produces accurate variant phasing through the use of combined DNA and RNA read backed phasing integrated with population phasing. Due to splicing, RNA-seq reads often span exons and UTRs, allowing read backed phasing over long ranges, while high coverage exome and whole genome sequencing can phase close proximity variants. A local haplotype is produced by testing all possible phase configurations, and selecting the configuration with the most support. Local haplotype blocks can be phased relative to one another when population data is available by anchoring the phase to common variants, where the population phase is likely correct.
B) Concordance between either population or RNA-seq based phasing and phasing by transmission using the Illumina NA12878 Platinum Genome as a function of variant minor allele frequency. Concordance is defined per variant as the percentage of variant – variant phase events that are correct as compared to the known transmission phase.
C) Percentage of all heterozygous variants as a function of minor allele frequency that can be assigned a genome wide phase by phASER using phase anchoring and combined RNA, WES, and WGS for NA12878. Variants are broken into those where population and read backed phasing assign the correct phase (correct), those where read backed phasing corrected the population phase (changed to correct) or those where read backed phasing made the phasing incorrect (changed to incorrect), all as compared to transmission phasing.
D) Percentage of phased variants that can be phased at greater than or equal to increasing genomic distances using WES, WGS, paired-end 75 and 250 RNA-seq data in four GTEx samples. Solid lines represent the means, and dotted lines the standard error.