In bioinformatics, exon skipping (ES) event prediction is an essential part of alternative splicing (AS) event analysis. Although many methods have been developed to predict ES events, a solution has yet to be found. In this study, given the limitations of machine learning algorithms with RNA-Seq data or genome sequences, researchers from Anhui University constructed a new feature, called RS (RNA-seq and sequence) features. These features include RNA-Seq features derived from the RNA-Seq data and sequence features derived from genome sequences. The researchers propose a novel Rotation Forest classifier to predict ES events with the RS features (RotaF-RSES). To validate the efficacy of RotaF-RSES, a dataset from two human tissues was used, and RotaF-RSES achieved an accuracy of 98.4%, a specificity of 99.2%, a sensitivity of 94.1%, and an area under the curve (AUC) of 98.6%.
The framework of Rotation Forest classifier to predict ES events with RS features
(RotaF-RSES), showing both the training and testing stages
RotaF-RSES involves two steps. Step 1: Obtaining known exons, their upstream and downstream introns, and then extract RNA-Seq features and sequence features according to their RNA-Seq data and sequence information. The above two features, called RS features, were used to build a classification model based on a Rotating Forest algorithm (RotaF-RSES). Step 2: After obtaining the RS features of an unknown type of exon, the RotaF-RSES model was used to determine the type of exon.
When compared to the other available methods, the results indicate that RotaF-RSES is efficient and can predict ES events with RS features.
Availability – The source code and data of this approach can be used via http://ailab.ahu.edu.cn:8087/RotaF-RSES/index.html.