Alignment-free RNA quantification tools have significantly increased the speed of RNA-seq analysis. However, it is unclear whether these state-of-the-art RNA-seq analysis pipelines can quantify small RNAs as accurately as they do with long RNAs in the context of total RNA quantification.
University of Texas researchers comprehensively tested and compared four RNA-seq pipelines for accuracy of gene quantification and fold-change estimation. They used a novel total RNA benchmarking dataset in which small non-coding RNAs are highly represented along with other long RNAs. The four RNA-seq pipelines consisted of two commonly-used alignment-free pipelines and two variants of alignment-based pipelines. They found that all pipelines showed high accuracy for quantifying the expression of long and highly-abundant genes. However, alignment-free pipelines showed systematically poorer performance in quantifying lowly-abundant and small RNAs.
Analysis pipelines and experimental design
The researchers used two pipelines each for the alignment-based and alignment-free approach. The alignment-based pipelines consisted of a HISAT2+featureCounts pipeline using HISAT2 for aligning reads to the human genome and using featureCounts for gene counting, and TGIRT-map, a customized pipeline for analyzing TGIRT-seq data. Two alignment-free tools, Kallisto and Salmon , were used for quantifying transcripts. For alignment-free tools, gene-level abundances were summarized by Tximport . All differentially-expressed gene tests were done by DESeq2
These researchers have shown that alignment-free and traditional alignment-based quantification methods perform similarly for common gene targets, such as protein-coding genes. However, they have identified a potential pitfall in analyzing and quantifying lowly-expressed genes and small RNAs with alignment-free pipelines, especially when these small RNAs contain biological variations.