A new tool for plant long non-coding RNA identification

Long non-coding RNAs (lncRNAs) are ubiquitous transcripts with crucial regulatory roles in various biological processes, including chromatin remodeling, post-transcriptional regulation, and epigenetic modifications. While accumulating evidence elucidates mechanisms by which plant lncRNAs modulate growth, root development, and seed dormancy, their accurate identification remains challenging due to a lack of plant-specific methods. Currently, the mainstream methods for plant lncRNA identification are largely developed based on human or animal datasets. Consequently, the accuracy and effectiveness of these methods in predicting plant lncRNAs have not been fully evaluated.

Recently, a group led by researchers at the Beijing Forestry University and Umea University collected extensive high-quality RNA-sequencing data from various plants and utilized these plant-specific data to retrain the models of three mainstream lncRNA prediction tools, namely CPAT, LncFinder, and PLEK. The performance of the retrained models was compared and evaluated against other popular lncRNA prediction tools, such as CPC2, CNCI, RNAplonc, and LncADeep. The results demonstrated that the retrained models significantly improved the prediction performance for plant lncRNAs. Among them, two retrained models, LncFinder-plant and CPAT-plant, outperformed others on multiple evaluation metrics, rendering them the most suitable tools for plant lncRNA identification.

This research developed a computational pipeline, named Plant-LncPipe, for the identification and analysis of plant lncRNAs. This pipeline integrates two top-performing identification models, CPAT-plant and LncFinder-plant, enabling a comprehensive computational process encompassing raw data preprocessing, transcript assembly, lncRNA identification, lncRNA classification, and lncRNA origins. This computational pipeline can be widely applied to various plant species.

Workflow of the present study and the pipeline for lncRNA identification and characterization

Workflow of the present study and the pipeline for lncRNA identification and characterization. The left panel illustrates our present workflow. The right panel depicts the process of a computational pipeline, Plant-LncPipe, which provides an ensemble method of lncRNA identification and key steps of lncRNA characterization.

The left panel illustrates our present workflow. The right panel depicts the process of a computational pipeline, Plant-LncPipe, which provides an ensemble method of lncRNA identification and key steps of lncRNA characterization.

The study demonstrates that retraining lncRNA prediction models on high-quality plant transcriptomic data enabled more accurate capture of plant lncRNA features, significantly enhancing prediction precision and reliability. The study underscored the importance of species-specific retraining to improve model accuracy. Retraining existing mature models retained prior accumulated experience and methodologies while further boosting model applicability and accuracy.

SourceNanjing Agricultural University

Availability – Plant-LncPipe is available at: https://github.com/xuechantian/Plant-LncRNA-pipline

Tian XC, Chen ZY, Nie S, Shi TL, Yan XM, Bao YT, Li ZC, Ma HY, Jia KH, Zhao W, Mao JF. (2024) Plant-LncPipe: a computational pipeline providing significant improvement in plant lncRNA identification. Hortic Res 11(4):uhae041. [article]

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Time limit is exhausted. Please reload CAPTCHA.