Small RNA-Seq has emerged as a powerful tool in transcriptomics, gene expression profiling and biomarker discovery. Sequencing cell-free nucleic acids, particularly microRNA (miRNA), from liquid biopsies additionally provides exciting possibilities for molecular diagnostics, and might help establish disease-specific biomarker signatures. The complexity of the small RNA-Seq workflow, however, bears challenges and biases that researchers need to be aware of in order to generate high-quality data. Rigorous standardization and extensive validation are required to guarantee reliability, reproducibility and comparability of research findings. Hypotheses based on flawed experimental conditions can be inconsistent and even misleading. Comparable to the well-established MIQE guidelines for qPCR experiments, this work aims at establishing guidelines for experimental design and pre-analytical sample processing, standardization of library preparation and sequencing reactions, as well as facilitating data analysis.
Researchers at the Technical University of Munich highlight bottlenecks in small RNA-Seq experiments, point out the importance of stringent quality control and validation, and provide a primer for differential expression analysis and biomarker discovery. Following these recommendations will encourage better sequencing practice, increase experimental transparency and lead to more reproducible small RNA-Seq results. This will ultimately enhance the validity of biomarker signatures, and allow reliable and robust clinical predictions.
Step | To consider | Recommended tools or algorithms |
---|---|---|
Data pre-processing | Trimming adapters | Btrim, FASTX-Toolkit |
Removing short reads | ||
Quality control | Library size and read distribution across samples | Btrim, FASTX-Toolkit, FaQCs |
Per base/sequence Phred score | ||
Read length distribution | ||
Assess degradation | ||
Check for over-represented sequences | ||
Read alignment (Filtering) | Reference database or genome | Bowtie, BWA, HTSEQ, SAMtools, SOAP2 |
Annotation | ||
Mismatch rate | ||
Handling of multi-reads | ||
Normalization | Library sizes and sequencing depth | DESeq2, EdgeR, svaseq |
Batch effects | ||
Read distribution | ||
Replication level | ||
DGE analysis | Data distribution | DESeq2, EdgeR, SAMSeq, voom limma |
Replication level | ||
False discovery rate | ||
Target prediction of miRNAs / siRNAs | In silico prediction or experimental validation | miRanda, miRTarBase, TarBase |
Canonical and non-canonical target regulation | ||
Biomarker identification | Sensitivity Specificity Classification rate | DESeq2, Simca-Q, Numerous R packages: base, pcaMethods, Mixomics |