Mutual – heuristic pairwise alignment of de Bruijn graphs to facilitate simultaneous transcript discovery

The advance of high-throughput sequencing has made it possible to obtain new transcriptomes and study splicing mechanisms in non-model organisms. In these studies, there is often a need to investigate the transcriptomes of two related organisms at the same time in order to find the similarities and differences between them.

The traditional approach to address this problem is to perform de novo transcriptome assemblies to obtain predicted transcripts for these organisms independently and then employ similarity comparison algorithms to study them.

Instead of obtaining predicted transcripts for these organisms separately from the intermediate de Bruijn graph structures employed by de novo transcriptome assembly algorithms, researchers at Texas A&M University have developed an algorithm to allow direct comparisons between paths in two de Bruijn graphs by first enumerating short paths in both graphs, and iteratively extending paths in one graph that have high similarity to paths in the other graph to obtain longer corresponding paths between the two graphs. These paths represent predicted transcripts that are present in both organisms. The researchers show that their algorithm recovers significantly more shared transcripts than traditional approaches by applying it to simultaneously recover transcripts in mouse against rat and in mouse against human from publicly available RNA-Seq libraries.

 

rna-seq

Difference between traditional strategy and the new strategy.

This strategy utilizes sequence similarity information within the paths that is often more reliable than coverage information.

These researcher’s approach generalizes the pairwise sequence alignment problem to allow the input to be non-linear structures, and provides a heuristic to reliably recover similar paths from the two structures. Their algorithm allows detailed investigation of the similarities and differences in alternative splicing between the two organisms at both the sequence and structure levels, even in the absence of reference transcriptomes or a closely related model organism.

Availability – A software program that implements our algorithm (Mutual) is available at: http://faculty.cse.tamu.edu/shsze/mutual

Fu S, Tarone AM, Sze SH. (2015) Heuristic pairwise alignment of de Bruijn graphs to facilitate simultaneous transcript discovery in related organisms from RNA-Seq data. BMC Genomics 16:S5. [article]

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Time limit is exhausted. Please reload CAPTCHA.