Next-generation RNA sequencing (RNA-seq) has been widely used to investigate alternative isoform regulations. Among them, alternative 3′ splice site (SS) and 5′ SS account for more than 30% of all alternative splicing (AS) events in higher eukaryotes. Recent studies have revealed that they play important roles in building complex organisms and have a critical impact on biological functions which could cause disease. Quite a few analytical methods have been developed to facilitate alternative 3′ SS and 5′ SS studies using RNA-seq data. However, these methods have various limitations and their performances may be further improved.
Researchers from the New Jersey Institute of Technology have devloped an empirical Bayes change-point model to identify alternative 3′ SS and 5′ SS. Compared with previous methods, this approach has several unique merits. First, this model does not rely on annotation information. Instead, it provides a systematic framework to integrate various information when available, in particular the useful junction read information, in order to obtain better performance. Second, the method utilizes an empirical Bayes model to efficiently pool information across genes to improve detection efficiency. Third, the method provides a flexible testing framework in which the user can choose to address different levels of questions, namely, whether alternative 3′ SS or 5′ SS happens, and/or where it happens. Simulation studies and real data application have demonstrated that this method is powerful and accurate.
Illustration and notation of change-point model for alternative 30 SS and 50 SS problem.
A) and B) show two AS events: alternative 30 SS and 50 SS selection, respectively. Blue rectangles represent constitutive exons (common regions) and purple rectangles represent alternatively spliced regions (extended regions). Solid lines and dashed lines indicate the introns and splicing options, respectively. C) and D) are examples of isoforms generated from alternative 30 SS and 50 SS selection, respectively. In C), isoform 1 has a higher expression level, while, in D), isoform 2 has a higher expression level. E) and F) show the results of mapping short reads to the reference genome, respectively. The reads from isoform 2 are marked as dark red, while reads from isoform 1 are marked as blue. G) and H) show the detailed results of the exons that contain alternative 30 SS and 50 SS. Because of the alternative 30 SS or 50 SS, the common region shared by the two isoforms has a higher gene expression level than the extended region. Thus, the average number of short reads (read-count) mapped to the common region will be larger than the one for extended region. This generates a change-point at the splice site, which partitions the whole region into two different homogeneous segments with different average read-counts.
Availability – The software is implemented in Java and can be freely downloaded from http://ebchangepoint.sourceforge.net/
Contact – [email protected]