Several methods exist for the prediction of precursor miRNAs (pre-miRNAs) in genomic or sRNA-seq (small RNA sequences) data produced by NGS (Next Generation Sequencing). One key information used for this task is the characteristic hairpin structure adopted by pre-miRNAs, that in general are identified using RNA folders whose complexity is cubic in the size of the input.
The vast majority of pre-miRNA predictors then rely on further information learned from previously validated miRNAs from the same or a closely related genome for the final prediction of new miRNAs.
With this paper, researchers from the University of Lyon wished to address three main issues:
The first was methodological and aimed at obtaining a more time-efficient predictor, however without losing in accuracy which represented a second issue. They indeed aimed at better predicting miRNAs at a genome scale, but also from sRNAseq data where in some cases, notably of plants, the current folding methods often infer the wrong structure.
The third issue is related to the fact that it is important to rely as little as possible on previously recorded examples of miRNAs. They therefore also sought a method that is less dependent on previous miRNA records.
As concerns the first and second issues, the resesearchera present a novel alternative to a classical folder based on a thermodynamic Nearest-Neighbour (NN) model for computing the free energy and predicting the classical hairpin structure of a pre-miRNA.
They show that the free energies thus computed correlate well with those of RNAfold. This novel method, called Mirinho, has quadratic instead of cubic complexity and is much more efficient also in practice.
When applied to sRNAseq data of plants, it gives in general better results than classical folders. On the third issue, they show that Mirinho, which uses as only knowledge the length of the loops and stem-arms and the free energy of the pre-miRNA hairpin, compares well with algorithms that require more information.
The results, obtained with different datasets, are indeed similar to those of other approaches with which such a comparison was possible. These needed to be publicly available softwares that could be used on a large input.
In some cases, Mirinho is even better in terms of sensitivity or precision.
Predicted secondary structures (MIRINHO). From top to bottom: gold standard structure in MIRBASE (with miRNA coloured in red), and structures predicted by, respectively, MIRINHO, MIRNAFOLD, and RNAFOLD. Secondary structure of pre-miRNA MI0002409: the best prediction was by MIRINHO with the closest values of stem length, and number of bulges and internal loops as in MIRBASE.