Modeling RNA degradation for RNA-Seq

Methods for the estimation of the transcript’s abundance using RNA-Seq data have been intensively studied, many of which are based on the assumption that the short-reads of RNA-Seq are uniformly distributed along the transcripts. However, the short-reads are found to be nonuniformly distributed along the transcripts, which can greatly reduce the accuracies of these methods based on the uniform assumption. Several methods are developed to adjust the biases induced by this nonuniformity, utilizing the short-read’s empirical distribution in transcript.

Researchers at USC and the Chinese Academy of Sciences have found that RNA degradation plays a major role in the formation of the short-read’s nonuniform distribution and thus developed a new approach that quantifies the short-read’s nonuniform distribution by precisely modeling RNA degradation. Based on their model of RNA degradation, a new statistical method was further developed to estimate transcript expression level, as well as the RNA degradation rate, for individual genes and their isoforms.

They showed that their method can improve the accuracy of transcript isoform expression estimation. The RNA degradation rate of individual transcript they estimated is consistent across samples and/or experiments/platforms. In addition, the RNA degradation rate from this model is independent of the RNA length, consistent with previous studies on RNA decay rate.

  • Wan L, Yan X, Chen T, Sun F. (2012) Modeling RNA degradation for RNA-Seq with applications. Biostatistics [Epub ahead of print]. [abstract]