The application of RNA-seq technology has become more extensive and the number of analysis procedures available has increased over the past years. Selecting an appropriate workflow has become an important issue for researchers in the field.
In their study, researchers from Capital Medical University compared six popular analytical procedures/pipelines using four RNA-seq datasets from mouse, human, rat, and macaque, respectively. The gene expression value, fold change of gene expression, and statistical significance were evaluated to compare the similarities and differences among the six procedures. qRT-PCR was performed to validate the differentially expressed genes (DEGs) from all six procedures.
Cufflinks–Cuffdiff demands the highest computing resources and Kallisto–Sleuth demands the least. Gene expression values, fold change, p and q values of differential expression (DE) analysis are highly correlated among procedures using HTseq for quantification. For genes with medium expression abundance, the expression values determined using the different procedures were similar. Major differences in expression values come from genes with particularly high or low expression levels. HISAT2–StringTie–Ballgown is more sensitive to genes with low expression levels, while Kallisto–Sleuth may only be useful to evaluate genes with medium to high abundance. When the same thresholds for fold change and p value are chosen in DE analysis, StringTie–Ballgown produce the least number of DEGs, while HTseq–DESeq2, –edgeR or –limma generally produces more DEGs. The performance of Cufflinks–Cuffdiff and Kallisto–Sleuth varies in different datasets. For DEGs with medium expression levels, the biological verification rates were similar among all procedures.
Guidelines for researchers to decide the appropriate procedure for RNA-seq analysis
Results are highly correlated among RNA-seq analysis procedures using HTseq for quantification. Difference in gene expression values mainly come from genes with particularly high or low expression levels. Moreover, biological validation rates of DEGs from all six procedures were similar for genes with medium expression levels. Investigators can choose analytical procedures according to their available computer resources, or whether genes of high or low expression levels are of interest. If computer resources are abundant, one can utilize multiple procedures to obtain the intersection of results to get the most reliable DEGs, or to obtain a combination of results to get a more comprehensive DE profile for transcriptomes.