Gene expression is a stochastic process so biological replicates in the same treatment group do not share identical expression levels. The presence of biological variation leads to the “over-dispersion” problem, e.g. the read counts show variation greater than expected from Poisson random variables. The authors evaluate several large public RNA-seq datasets and find that the estimated dispersion in existing methods does not adequately capture the heterogeneity of biological variance among samples.
They present Dispersion Shrinkage for Sequencing (DSS), a new empirical Bayes shrinkage estimate of the dispersion parameters that overcomes the over-dispersion problem.
The new method is implemented in an R package which is available from Bioconductor: http://www.bioconductor.org/packages/devel/bioc/html/DSS.html
- Wu H, Wang C, Wu Z. (2012) A new shrinkage estimator for dispersion improves differential expression detection in RNA-seq data. Biostatistics. [Epub ahead of print]. [article]