RNA-seq is now the technology of choice for genome-wide differential gene expression experiments, but it is not clear how many biological replicates are needed to ensure valid biological interpretation of the results or which statistical tools are best for analyzing the data.
Researcher from the University of Dundee performed an RNA-seq experiment with 48 biological replicates in each of two conditions to answer these questions and provide guidelines for experimental design. With three biological replicates, eight of the 11 tools evaluated found only 20%-40% of the significantly differentially expressed (SDE) genes identified with the full set of 42 clean replicates. This rises to >85% for the subset of SDE genes changing in expression by more than fourfold.
To achieve >85% for all SDE genes regardless of fold change requires more than 20 biological replicates.
The same eight tools successfully control their false discovery rate at ≲5% for all numbers of replicates, while the remaining three tools fail to control their FDR adequately, particularly for low numbers of replicates.
Hierarchical clustering of eleven RNA-seq DGE tools and five standard statistical tests using all of the full clean data set comprising 42 WT and 44 Δsnf2 replicates. For each tool, or test, a 7126-element long vector of 1’s and 0’s was constructed representing whether each gene in the annotation was called as SDE (adjusted P-value or FDR threshold ≤0.05) by the tool or not. The vectors for each tool and test were then ordered by the gene id and hierarchically clustered by Euclidian distance with complete linkage using the R package pvclust. Approximately unbiased P-value percentages (bracketed values) calculated for each branch in the clustering represent the support in the data for the observed sub-tree clustering. AU% > 95% are strongly supported by the data. AU% values are not shown for branch points where AU% = 100 for clarity. The outlier clustering of baySeq, DEGSeq, edgeR (GLM), and NOISeq suggest that these tools are clearly distinct from the other tools. Combined with the tool performance data shown in Figure 2, this suggests that, given a large number of replicates, the tools and tests in Cluster 1 are reliably and reproducibly converging on a similar answer, and are likely to be correctly capturing the SDE signal in the data.
For future RNA-seq experiments, these results suggest that more than six biological replicates should be used, rising to more than 12 when it is important to identify SDE genes for all fold changes. If less than 12 replicates are used, a superior combination of true positive and false positive performances makesedgeRthe leading tool. For higher replicate numbers, minimizing false positives is more important and DESeq marginally outperforms the other tools.