FreePSI – an alignment-free approach to estimating exon-inclusion ratios without a reference transcriptome

Alternative splicing plays an important role in many cellular processes of eukaryotic organisms. The exon-inclusion ratio, also known as percent spliced in, is often regarded as one of the most effective measures of alternative splicing events. The existing methods for estimating exon-inclusion ratios at the genome scale all require the existence of a reference transcriptome. In this paper, researchers from Tsinghua University and the University of California, Riverside propose an alignment-free method, FreePSI, to perform genome-wide estimation of exon-inclusion ratios from RNA-Seq data without relying on the guidance of a reference transcriptome. It uses a novel probabilistic generative model based on k-mer profiles to quantify the exon-inclusion ratios at the genome scale and an efficient expectation-maximization algorithm based on a divide-and-conquer strategy and ultrafast conjugate gradient projection descent method to solve the model. The researchers compare FreePSI with the existing methods on simulated and real RNA-seq data in terms of both accuracy and efficiency and show that it is able to achieve very good performance even though a reference transcriptome is not provided. These results suggest that FreePSI may have important applications in performing alternative splicing analysis for organisms that do not have quality reference transcriptomes.

An overview of FreePSI

rna-seq

(A) The input of FreePSI includes a reference genome with exon boundary annotation and a set of RNA-seq reads. (B) The main component of FreePSI is a probabilistic generative model. The abundance flow graph represents all possible isoforms and their abundance levels. For each exon (or junction), the (theoretical) distribution of k-mers in the exon (or junction, respectively) is derived by assuming that the reads were uniformly sequenced. (C) An EM algorithm is employed to perform genome-wide inference for the model, and a divided-and-conquer strategy decomposes the key optimization problem in the M-step into independent subproblems for each gene. Each subproblem is solved using a conjugate gradient projection algorithm. (D) The output of FreePSI includes estimated PSI values for all exons.

Availability – FreePSI is implemented in C++ and freely available to the public on GitHub. https://github.com/JY-Zhou/FreePSI

Zhou J, Ma S, Wang D, Zeng J, Jiang T. (2017) FreePSI: an alignment-free approach to estimating exon-inclusion ratios without a reference transcriptome. Nucleic Acids Res [Epub ahead of print]. [article]

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Time limit is exhausted. Please reload CAPTCHA.