A Novel Statistical Approach for Jointly Analyzing RNA-seq Data from Reciprocal Crosses and Inbred Lines

RNA sequencing (RNA-seq) not only measures total gene expression but may also measure allele-specific gene expression in diploid individuals. RNA-seq data collected from F1 reciprocal crosses in mouse can powerfully dissect strain and parent-of-origin effects on allelic imbalance of gene expression.


In this paper, researchers from the University of North Carolina at Chapel Hill develop a novel statistical approach to analyze RNA-seq data from F1 and inbred strains. Method development was motivated by a mouse study on F1 reciprocal crosses derived from highly divergent mouse strains, to which they apply the proposed method. Their method jointly models the total number of reads and the number of allele-specific reads of each gene, which significantly boosts power for detecting strain and particularly parent of origin effects. The method deals with the over-dispersion problem commonly observed in read counts and can flexibly adjust for the effects of covariates such as sex and read depth. The X chromosome in mouse presents particular challenges. As in other mammals, X chromosome inactivation silences one of the two X chromosomes in each female cell, though the choice of which chromosome to be silenced can be highly skewed by alleles at the X-linked X controlling element (Xce) and stochastic effects. This model accounts for these chromosome-wide effects on an individual level, allowing proper analysis of chromosome X expression. Furthermore, the researchers propose a genomic control procedure to properly control type I error for RNA-seq studies. A number of these methodological improvements can also be applied to RNA-seq data from other species as well as other types of next-generation sequencing datasets. Finally, they show through simulations that increasing the number of samples is more beneficial than increasing the library size for mapping both the strain and parent of origin effects. Unless sample recruiting is too expensive to conduct, they recommend to sequence more samples with lower coverage.

Availability – An R package that implements the proposed models can be found online at http://www.bios.unc.edu/~feizou/software/rxSeq/

Zou F, Sun W, Crowley JJ, Zhabotynsky V, Sullivan PF, Pardo-Manuel de Villena FF. (2014) A Novel Statistical Approach for Jointly Analyzing RNA-seq Data from F1 Reciprocal Crosses and Inbred Lines. Genetics [Epub ahead of print]. [article]