RNA-seq has become a routine technique in differential expression (DE) identification. Scientists face a number of experimental design decisions including the sample size. The power for detecting differential expression is affected by several factors including the fraction of DE genes, distribution of the magnitude of DE, distribution of gene expression level, sequencing coverage and the choice of type I error control. The complexity and flexibility of RNA-seq experiments, the high-throughput nature of transcriptome-wide expression measurements and the unique characteristics of RNA-seq data make the power assessment particularly challenging.
Researchers at Emory University propose prospective power assessment instead of a direct sample size calculation by making assumptions on all of these factors. Their power assessment tool includes two components: (1) a semi-parametric simulation that generates data based on actual RNA-seq experiments with flexible choices on baseline expressions, biological variations, patterns of DE; and (2) a power assessment component that provides a comprehensive view of power. The researchers introduce the concepts of stratified power and false discovery cost, and demonstrate the usefulness of our method in experimental design (such as sample size and sequencing depth), as well as analysis plan (gene filtering).
AVAILABILITY: The proposed method is implemented in a freely available R software package proper. http://web1.sph.emory.edu/users/hwu30/PROPER.html