As RNA-seq rapidly develops and costs continually decrease, the quantity and frequency of samples being sequenced will grow exponentially. With proteomic investigations becoming more multivariate and quantitative, determining a study’s optimal sample size is now a vital step in experimental design. Current methods for calculating a study’s required sample size are mostly based on the hypothesis testing framework, which assumes each gene count can be modeled through Poisson or negative binomial distributions; however, these methods are limited when it comes to accommodating covariates. To address this limitation, researchers from Vanderbilt University propose an estimating procedure based on the generalized linear model. This easy-to-use method constructs a representative exemplary dataset and estimates the conditional power, all without requiring complicated mathematical approximations or formulas. Even more attractive, the downstream analysis can be performed with current R/Bioconductor packages. To demonstrate the practicability and efficiency of this method, the researchers apply it to three real-world studies, and introduce their on-line calculator developed to determine the optimal sample size for a RNA-seq study.
Availability -RnaSeqSampleSize is avialable at: https://cqs.mc.vanderbilt.edu/shiny/RnaSeqSampleSize/