RNA-Seq experiments have revealed a multitude of novel ncRNAs. The gold standard for their analysis based on simultaneous alignment and folding (Sankoff, 1985) suffers from extreme time complexity of O(n6). Subsequently, numerous faster “Sankoff-style” approaches have been suggested. Commonly, the performance of such methods relies on sequence-based heuristics that restrict the search space to optimal or near-optimal sequence alignments; however, the accuracy of sequence-based methods breaks down for RNAs with sequence identities below 60% (Gardner et al., 2005). Alignment approaches like LocARNA (Will et al., 2007) that do not require sequence-based heuristics, have been limited to high complexity (≥ quartic time.)
Breaking this barrier, researchers from the University of Leipzig introduce the novel Sankoff-style algorithm “SPARSE”, which runs in quadratic time without sequence-based heuristics. To achieve this low complexity, on par with sequence alignment algorithms, SPARSE features strong sparsification based on structural properties of the RNA ensembles. Following PMcomp (Hofacker et al., 2004), SPARSE gains further speed-up from lightweight energy computation. While all existing lightweight Sankoff-style methods restrict Sankoff’s original model by disallowing loop deletions and insertions, SPARSE transfers the Sankoff algorithm to the lightweight energy model completely for the first time. Compared to LocARNA, SPARSE achieves similar alignment and better folding quality in significantly less time (speedup: 3.7). At similar run-time, it aligns low sequence identity instances substantially more accurate than RAF (Do et al., 2008), which uses sequence-based heuristics.
Availability: SPARSE is freely available at http://www.bioinf.uni-freiburg.de/Software/SPARSE