RNA-sequencing (RNA-Seq) has become a popular tool for transcriptome profiling in mammals. However, accurate estimation of allele-specific expression (ASE) based on alignments of reads to the reference genome is challenging, because it contains only one allele on a mosaic haploid genome. Even with the information of diploid genome sequences, precise alignment of reads to the correct allele is difficult because of the high-similarity between the corresponding allele sequences.
Now, researchers at the University of California, San Diego have developed a Bayesian approach to estimate ASE from RNA-Seq data with diploid genome sequences. In the statistical framework, the haploid choice is modeled as a hidden variable and estimated simultaneously with isoform expression levels by variational Bayesian inference. Through the simulation data analysis, the researchers demonstrate the effectiveness of the proposed approach in terms of identifying ASE compared to the existing approach. They also show that their approach enables better quantification of isoform expression levels compared to the existing methods, TIGAR2, RSEM and Cufflinks. In the real data analysis of the human reference lymphoblastoid cell line GM12878, some autosomal genes were identified as ASE genes, and skewed paternal X-chromosome inactivation in GM12878 was identified.
ASE-TIGAR pipeline for estimating ASE. The input data for ASE-TIGAR are RNA-Seq data, paternal and maternal cDNA sequences, represented as rectangles with double lines. Alternatively, whole-genome sequencing data can be used as an input with pre-processing steps (represented as shaded rectangles and circles)
Availability – An implementation of our method is available at http://nagasakilab.csml.org/ase-tigar