Determination of haplotypes is important for modelling the phenotypic consequences of genetic variation in diploid organisms, including cis-regulatory control and compound heterozygosity. Karolinska Institute researchers realized that single cell RNA-seq (scRNA-seq) data is well suited for phasing genetic variants, since both transcriptional bursts and technical bottlenecks cause pronounced allelic fluctuations in individual single cells.
Here the researchers present scphaser, an R package that phases alleles at heterozygous variants to reconstruct haplotypes within transcribed regions of the genome using scRNA-seq data. The devised method efficiently and accurately reconstructed the known haplotype for ≥93% of phasable genes in both human and mouse. It also enables phasing of rare and de novo variants and variants far apart within genes, which is hard to attain with population-based computational inference.
Concept and performance of scphaser
(A) Number of genes against observed ASE in scRNA-seq (two human and a mouse dataset) and bulk RNA-seq data. Line indicates mean and band the inter-quartile range across cells. (B) Transcriptional bursts and technical drop-out cause frequent monoallelic or allele-biased observations in scRNA-seq data, which can reveal the phase of transcribed sequences. (C) Percent correctly phased SNVs in the human and mouse dataset, X-axis labels denote the input, method and weighing settings for the phasing.
Availability – scphaser is implemented as an R package. Tutorial and code are available at https://github.com/edsgard/scphaser
Contact – [email protected]