Reference is regularly made to the power of new genomic sequencing approaches. Using powerful technology, however, is not the same as having the necessary power to address a research question with statistical robustness. In the rush to adopt new and improved genomic research methods, limitations of technology and experimental design may be initially neglected.
Scientists at the University of Otago discuss these issues with regard to RNA sequencing (RNA-seq). RNA-seq adds large-scale transcriptomics to the toolkit of ecological and evolutionary biologists, enabling differential gene expression (DE) studies in non-model species without the need for prior genomic resources. High biological variance is typical of field-based gene expression studies and means that larger sample sizes are often needed to achieve the same degree of statistical power as clinical studies based on data from cell lines or inbred animal models. Sequencing costs have plummeted, yet RNA-seq studies still underutilise biological replication. Finite research budgets force a trade-off between sequencing effort and replication in RNA-seq experimental design. However, clear guidelines for negotiating this trade-off, while taking into account study-specific factors affecting power, are currently lacking. Study designs that prioritise sequencing depth over replication fail to capitalise on the power of RNA-seq technology for DE inference. Significant recent research effort has gone into developing statistical frameworks and software tools for power analysis and sample size calculation in the context of RNA-seq DE analysis. The authors synthesise progress in this area and derive an accessible rule-of-thumb guide for designing powerful RNA-seq experiments relevant in eco-evolutionary and clinical settings alike.
Simple guide to power and replication in RNA-seq
Power in RNA-seq differential expression analysis depends on replication, biological variance and effect size. Expected statistical power is plotted for detecting different effect sizes of expression difference (as fold-change) given different sample sizes in hypothetical cases of low biological variance (e.g., an inbred zebrafish line, CV = 0.2) and high biological variance (e.g., a wild reef fish population, CV = 0.6). Calculations were performed in the RNASeqPower package (Hart et al. 2013) in R, assuming 10 reads average sequencing depth and a 5% false positive rate. CV: coefficient of variation. A fold-change of 2 is equivalent to a log 2 fold-change of 1.