The estimation of isoform abundances from RNA-Seq data requires a time-intensive step of mapping reads to either an assembled, or previously annotated transcriptome, followed by an optimization procedure for deconvolution of multi-mapping reads. These procedures are essential for downstream analysis such as differential expression. In cases where it is desirable to adjust the underlying annotation, for example upon the discovery of novel isoforms or errors in existing annotations, current pipelines must be rerun from scratch. This makes it difficult to update abundance estimates after re-annotation, or to explore the effect of changes in the transcriptome on analyses.
Researchers at UC Berkeley have developed a novel efficient algorithm for updating abundance estimates from RNA-Seq experiments upon re-annotation that does not require re-analysis of the entire dataset. Their approach is based on a fast partitioning algorithm for identifying transcripts whose abundances may depend on the added or deleted isoforms, and on a fast follow-up approach to re-estimating abundances for all transcripts. They demonstrate the effectiveness of our methods by showing how to synchronize RNA-Seq abundance estimates with the daily RefSeq incremental updates. Thus, they provide a practical approach to maintaining relevant databases of RNA-Seq derived abundance estimates even as annotations are being constantly revised.
Availability – ReXpress is freely available, together with source code, at http://bio.math.berkeley.edu/ReXpress/
- Roberts A, Schaeffer L, Pachter L. (2013) Updating RNA-Seq analyses after re-annotation. Bioinformatics [Epub ahead of print]. [abstract]