ASplice – a scalable and memory-efficient algorithm for de novo transcriptome assembly of non-model organisms

With increased availability of de novo assembly algorithms, it is feasible to study entire transcriptomes of non-model organisms. While algorithms are available that are specifically designed for performing transcriptome assembly from high-throughput sequencing data, they are very memory-intensive, limiting their applications to small data sets with few libraries.

Texas A&M University researchers develop a transcriptome assembly algorithm that recovers alternatively spliced isoforms and expression levels while utilizing as many RNA-Seq libraries as possible that contain hundreds of gigabases of data. New techniques are developed so that computations can be performed on a computing cluster with moderate amount of physical memory.

 Illustration of the iterative algorithm to enumerate k-mer frequencies

rna-seq

For the k -mer \protecta1ak, its two frequency slots with zero counts for nucleotides c and t are removed to obtain (k +1)-mers \protecta1aka and \protecta1akg

This strategy minimizes memory consumption while simultaneously obtaining comparable or improved accuracy over existing algorithms. It provides support for incremental updates of assemblies when new libraries become available.

Availability – A software program that implements the algorithm is available at: http://faculty.cse.tamu.edu/shsze/asplice.

Sze SH, Pimsler ML, Tomberlin JK, Jones CD, Tarone AM. (2017) A scalable and memory-efficient algorithm for de novo transcriptome assembly of non-model organisms. BMC Genomics 18(Suppl 4):387. [article]

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Time limit is exhausted. Please reload CAPTCHA.