Rnnotator – A software pipeline for reference genome independent de novo assembly into transcriptomes of non-model organisms

RNA-Seq has emerged as a powerful tool for studying transcriptomes. It aims to provide a comprehensive list of all transcripts and their expression levels from a given cell or cell population under a particular condition. RNA-Seq data analysis typically involves aligning the short read sequences to a reference genome to reveal reads from exons, splicing junctions, or polyA ends. This information is used to derive novel gene models or refine existing gene models, including exon structure and untranslated regions (UTRs) and to determine gene expression levels from read count statistics

A few software packages have been developed to perform these data analysis tasks, including TopHat/Cufflinks, ERANGE, and Scripture. This type of reference-based approach can be very successful if the reference genomes are good quality. However, except for a few model organisms, genome assemblies are often incomplete or unavailable. Similarly, sequencing RNA from complex microbial communities, or metatranscriptome sequencing, also poses considerable challenges for data analysis because the genomes for most of the organisms are not known. Thus, in many cases, reference-based analysis of RNA-Seq data is not possible.

The authors describe Rnnotator, an automated software pipeline that generates transcript models by de novo assembly of RNA-Seq data without the need for a reference genome, and demonstrate that transcriptome assembly is complementary to reference based analysis when reference genomes are incomplete.

Rnnotator enables RNA-Seq studies in any organism, simple or complex and also provides an opportunity to discover new types of RNA not encoded in reference genomes.

Martin J, Bruno VM, Fang Z, Meng X, Blow M, Zhang T, Sherlock G, Snyder M, Wang Z. (2010) Rnnotator: an automated de novo transcriptome assembly pipeline from stranded RNA-Seq reads. BMC Genomics 2010 11(1), 663. [article] [Video] Jeff Martin of the DOE Joint Genome Institute discusses a de novo transcriptome assembly pipeline from short RNA-Seq reads at the “Sequencing, Finishing, Analysis in the Future” meeting in Santa Fe, NM, 2010.