miRPipe – a unified computational framework for a robust, reliable, and reproducible identification of novel miRNAs from RNA sequencing data

In eukaryotic cells, miRNAs regulate a plethora of cellular functionalities ranging from cellular metabolisms, and development to the regulation of biological networks and pathways, both under homeostatic and pathological states like cancer. Despite their immense importance as key regulators of cellular processes, accurate and reliable estimation of miRNAs using Next Generation Sequencing is challenging, largely due to the limited availability of robust computational tools/methods/pipelines.

Researchers at the Indraprastha Institute of Information Technology have developed miRPipe, an end-to-end computational framework for the identification, characterization, and expression estimation of small RNAs, including the known and novel miRNAs and previously annotated pi-RNAs from small-RNA sequencing profiles. This workflow detects unique novel miRNAs by incorporating the sequence information of seed and non-seed regions, concomitant with clustering analysis. This approach allows reliable and reproducible detection of unique novel miRNAs and functionally same miRNAs (paralogues). The researchers validated the performance of miRPipe with the available state-of-the-art pipelines using both synthetic datasets generated using the newly developed miRSim tool and three cancer datasets (Chronic Lymphocytic Leukemia, Lung cancer, and breast cancer). In the experiment over the synthetic dataset, miRPipe is observed to outperform the existing state-of-the-art pipelines (accuracy: 95.23% and F 1-score: 94.17%). Analysis on all the three cancer datasets shows that miRPipe is able to extract more number of known dysregulated miRNAs or piRNAs from the datasets as compared to the existing pipelines.

Infographic representation of miRPipe workflow

This pipeline identifies differentially expressed novel miRNAs, known miRNAs, and known piRNAs.The RNA-Seq data in standard FASTQ format is cleaned for adaptor contamination. Reads of specific lengths are aligned to the human reference genome for miRNA and piRNA identification. Further, the aligned reads are processed using novel seed-based clustering for their functional annotation. Lastly, their differential expression analysis is computed using DESeq2 to find the significantly dysregulated miRNAs and piRNAs.


Ruhela V, Gupta A, Sriram K, Ahuja G, Kaur G, Gupta R. (2022) A Unified Computational Framework for a Robust, Reliable, and Reproducible Identification of Novel miRNAs From the RNA Sequencing Data. Front Bioinform 2:842051. [article]

Leave a Reply

Your email address will not be published. Required fields are marked *


Time limit is exhausted. Please reload CAPTCHA.