Transcriptome sequencing has long been the favored method for quickly and inexpensively obtaining a large number of gene sequences from an organism with no reference genome. Due to the rapid increase in throughputs and decrease in costs of next generation sequencing, RNA-Seq in particular has become the method of choice. However, the very short reads (e.g. 2 × 90 bp paired ends) from next generation sequencing makes de novo assembly to recover complete or full-length transcript sequences an algorithmic challenge.
Here, researchers from BGI Shenzhen, present SOAPdenovo-Trans, a de novo transcriptome assembler designed specifically for RNA-Seq. They evaluated its performance on transcriptome datasets from rice and mouse. Using as their benchmarks the known transcripts from these well-annotated genomes (sequenced a decade ago), they assessed how SOAPdenovo-Trans and two other popular transcriptome assemblers handled such practical issues as alternative splicing and variable expression levels.
AVAILABILITY: Source code and user manual are available at http://sourceforge.net/projects/soapdenovotrans/