Genomic alignment of small RNA (smRNA) sequences such as microRNAs poses considerable challenges due to their short length (∼21 nucleotides [nt]) as well as the large size and complexity of plant and animal genomes. While several tools have been developed for high-throughput mapping of longer mRNA-seq reads (>30 nt), there are few that are specifically designed for mapping of smRNA reads including microRNAs. The accuracy of these mappers has not been systematically determined in the case of smRNA-seq. In addition, it is unknown whether these aligners accurately map smRNA reads containing sequence errors and polymorphisms.
By using simulated read sets, researchers from the Baker IDI Heart and Diabetes Institute determine the alignment sensitivity and accuracy of 16 short-read mappers and quantify their robustness to mismatches, indels, and nontemplated nucleotide additions. These were explored in the context of a plant genome (Oryza sativa, ∼500 Mbp) and a mammalian genome (Homo sapiens, ∼3.1 Gbp). Analysis of simulated and real smRNA-seq data demonstrates that mapper selection impacts differential expression results and interpretation. These results will inform on best practice for smRNA mapping and enable more accurate smRNA detection and quantification of expression and RNA editing.
Alignment of short simulated Illumina-like mRNA-derived sequences to the genome
For panels A and B, green bars denote correctly mapped reads, yellow bars denote incorrect mapping to protein coding locations, purple bars represent mapping to non-mRNA and non-hairpin loci and grey bars denote reads unmapped or below map quality threshold. Red bars denote incorrect mapping to hairpin loci, these are plotted on enlarged axes for visibility. Read mapping results at varying read lengths are shown for O. sativa (A) and H. sapiens (B).