A team led by researchers at Georgia State University now propose a novel statistical genome-guided method called “Transcriptome Reconstruction using Integer Programing” (TRIP) that incorporates fragment length distribution into novel transcript reconstruction from paired-end RNA-Seq reads. To reconstruct novel transcripts, they create a splice graph based on inferred exon boundaries and RNA-Seq reads. A splice graph is a directed acyclic graph (DAG), whose vertices represent exons and edges represent splicing events. They enumerate all maximal paths in the splice graph using a depth-first-search (DFS) algorithm. These paths correspond to putative transcripts and are the input for the TRIP algorithm.
To solve the transcriptome reconstruction problem you must select a set of putative transcripts with the highest support from the RNA-Seq reads. They formulate this problem as an integer program. The objective to select the smallest set of putative transcripts that yields a good statistical fit between the fragment length distribution empirically determined during library preparation and fragment lengths implied by mapping read pairs to selected transcripts.
Preliminary experimental results on synthetic datasets generated with various sequencing parameters and distribution assumptions show that TRIP has increased transcriptome reconstruction accuracy compared to previous methods that ignore fragment length distribution information.
- Mangul S, Caciula A, Brinza D, Mandoiu II, Zelikovsky A. (2012) TRIP: a method for novel transcript reconstruction from paired-end RNA-seq reads. BMC Bioinformatics – part of the supplement: Highlights from the Eighth International Society for Computational Biology (ISCB) Student Council Symposium 2012. [abstract]