Comparison of software packages for detecting differential expression in RNA-seq studies

RNA-sequencing (RNA-seq) has rapidly become a popular tool to characterize transcriptomes. A fundamental research problem in many RNA-seq studies is the identification of reliable molecular markers that show differential expression between distinct sample groups. Together with the growing popularity of RNA-seq, a number of data analysis methods and pipelines have already been developed for this task. Currently, however, there is no clear consensus about the best practices yet, which makes the choice of an appropriate method a daunting task especially for a basic user without a strong statistical or computational background.

To assist the choice, researchers from the University of Turku, Finland perform here a systematic comparison of eight widely used software packages and pipelines for detecting differential expression between sample groups in a practical research setting and provide general guidelines for choosing a robust pipeline. In general, these results demonstrate how the data analysis tool utilized can markedly affect the outcome of the data analysis, highlighting the importance of this choice.

Table 1

Software packages for detecting differential expression

Method Version Reference Normalizationa Read count distribution assumption Differential expression test
edgeR 3.0.8 [4] TMM/Upper quartile/RLE (DESeq-like)/None (all scaling factors are set to be one) Negative binomial distribution Exact test
DESeq 1.10.1 [5] DESeq sizeFactors Negative binomial distribution Exact test
baySeq 1.12.0 [6] Scaling factors (quantile/TMM/total) Negative binomial distribution Assesses the posterior probabilities of models for differentially and non-differentially expressed genes via empirical Bayesian methods and then compares these posterior likelihoods
NOIseq 1.1.4 [7] RPKM/TMM/Upper quartile Nonparametric method Contrasts fold changes and absolute differences within a condition to determine the null distribution and then compares the observed differences to this null
SAMseq (samr) 2.0 [8] SAMseq specialized method based on the mean read count over the null features of the data set Nonparametric method Wilcoxon rank statistic and a resampling strategy
Limma 3.14.4 [9] TMM voom transformation of counts Empirical Bayes method
Cuffdiff 2 (Cufflinks) 2.0.2-beta [10] Geometric (DESeq-like)/quartile/classic-fpkm Beta negative binomial distribution t-test
EBSeq 1.1.7 [11] DESeq median normalization Negative binomial distribution Evaluates the posterior probability of differentially and non-differentially expressed entities (genes or isoforms) via empirical Bayesian methods
  • aIn case of availability of several normalization methods, the default one is underlined.

Seyednasrollah F, Laiho A, Elo LL. (2013) Comparison of software packages for detecting differential expression in RNA-seq studies. Brief Bioinform [Epub ahead of print]. [article]