Pseudouridylation is the most prevalent type of posttranscriptional modification in various stable RNAs of all organisms, which significantly affects many cellular processes that are regulated by RNA. Thus, accurate identification of pseudouridine (Ψ) sites in RNA will be of great benefit for understanding these cellular processes. Due to the low efficiency and high cost of current available experimental methods, it is highly desirable to develop computational methods for accurately and efficiently detecting Ψ sites in RNA sequences. However, the predictive accuracy of existing computational methods is not satisfactory and still needs improvement.
In this study, researchers from Anhui University developed a new model, PseUI, for Ψ sites identification in three species, which are H. sapiens, S. cerevisiae, and M. musculus. Firstly, five different kinds of features including nucleotide composition (NC), dinucleotide composition (DC), pseudo dinucleotide composition (pseDNC), position-specific nucleotide propensity (PSNP), and position-specific dinucleotide propensity (PSDP) were generated based on RNA segments. Then, a sequential forward feature selection strategy was used to gain an effective feature subset with a compact representation but discriminative prediction power. Based on the selected feature subsets, the researchers built their model by using a support vector machine (SVM).
Finally, the generalization of the model was validated by both the jackknife test and independent validation tests on the benchmark datasets. The experimental results showed that this model is more accurate and stable than the previously published models. It is expected that the model, PseUI, will become a useful tool for accurate identification of RNA Ψ sites
Flow charts of the jackknife cross validation for features encoded by PSNP or PSDP
Availability – A user-friendly web server for the model is available at http://zhulab.ahu.edu.cn/PseUI