Since the small RNA-sequencing (sRNA-seq) technology became available, it allowed the discovery of thousands new microRNAs (miRNAs) in humans and many other species, providing new data on these small RNAs (sRNAs) of high biological and translational relevance. MiRNA discovery has not yet reached saturation, even in the most studied model organisms, and many researchers are using sRNA-seq in studies with different aims in biomedicine, fundamental research and in applied animal sciences.
Researchers from the review several miRNA discovery and characterization software tools that implement different strategies, providing a useful guide for researchers to select the programs best suiting their study objectives and data. After a brief introduction on miRNA biogenesis, function and characteristics, useful to understand the biological background considered by the algorithms, they survey the current state of miRNA discovery bioinformatics discussing 26 different sRNA-seq-based miRNA prediction software and toolkits released in the past 6 years, including 15 methods specific for miRNA prediction and 11 more general-purpose software suites for sRNA-seq data analysis. The researchers highlight the main features of mature miRNAs and miRNA precursors considered by the methods categorizing them according to prediction strategy and implementation. In addition, they describe a typical miRNA prediction and analysis workflow by delineating the objectives, potentialities and main steps of sRNA-seq data analysis projects, from preparatory data processing to miRNA prediction, quantification and diverse downstream analyses. Finally, the researchers outline the caveats affecting sRNA-seq-based prediction tools, and we indicate the possibilities offered by data set pooling and by integration with other types of high-throughput sequencing data.
miRNA discovery from sRNA-seq experiments
(A) Number of known mature miRNAs (black) and pre-miRNAs (gray) present in miRBase by year, since 2002. (B) Examples of the three main cases of putative precursors’ read signatures: the lack of reads corresponding to any of the two mature miRNAs corresponds to read signatures on on-miRNA hairpins; hairpin with read signatures typical of a true precursor, with precise 3′ overhangs and reads mapping to definite regions of the hairpin (i.e. aligning to the miRNA, the miRNA* or the loop without overlaps between different products) indicates high confidence miRNAs; an intermediate case in represented by bona fide pre-miRNAs with read signatures displaying high 5′ heterogeneity or the absence of miRNA* reads, which cannot be annotated with high confidence, and can be confirmed using additional evidence, such as read depletion in miRNA biogenesis mutants or enrichment in Ago-IP experiments. (C) Drawing illustrating how data set pooling can increase the sensitivity of miRNA prediction analyses, by allowing the generation of highly informative read signatures even for lowly expressed pre-miRNAs.