RNA-seq experiments are usually carried out in three or fewer replicates. In order to work well with so few samples, Differential Gene Expression (DGE) tools typically assume the form of the underlying distribution of gene expression. A recent highly replicated study revealed that RNA-seq gene expression measurements in yeast are best represented as being drawn from an underlying negative binomial distribution. In this paper, researchers from the University of Dundee show that the statistical properties of gene expression in the higher eukaryote Arabidopsis thaliana are essentially identical to those from yeast despite the large increase in the size and complexity of the transcriptome: Gene expression measurements from this model plant species are consistent with being drawn from an underlying negative binomial or log-normal distribution and the false positive rate performance of nine widely used DGE tools is not strongly affected by the additional size and complexity of the A. thaliana transcriptome. For RNA-seq data, we therefore recommend the use of DGE tools that are based on the negative binomial distribution.
[Left] Pairwise Pearson correlation of gene expression for all 17 replicates. Apart from replicate 11, all replicates correlate very well. [Right]: Same as left, but with replicate 11 filtered out, allowing the patterns of correlation among the remaining 16 replicates to be better seen.