Several state-of-the-art methods for isoform identification and quantification are based on sparse probabilistic models, such as Lasso regression. However, explicitly listing the — possibly exponentially — large set of candidate transcripts is intractable for genes with many exons. For this reason, existing approaches using sparse models are either restricted to genes with few exons, or only run the regression algorithm on a small set of pre-selected isoforms.
A team led by researchers at The Centre for Computational Biology, France have developed a new technique called FlipFlop which can efficiently tackle the sparse estimation problem on the full set of candidate isoforms by using network flow optimization. This technique removes the need of a preselection step, leading to better isoform identification while keeping a low computational cost. Experiments with synthetic and real RNA-Seq data confirm that this approach is more accurate than alternative methods and one of the fastest available.
Availability – Source code is freely available as an R package at http://cbio.mines-paristech.fr/flipflop.
- Bernard E, Jacob L, Mairal J, Vert JP. (2013) Efficient RNA Isoform Identification and Quantification from RNA-Seq Data with Network Flows. hal-00803134, version 2. [article]