Evaluating statistical analysis models for RNA sequencing experiments

Validating statistical analysis methods for RNA sequencing (RNA-seq) experiments is a complex task. Researchers often find themselves having to decide between competing models or assessing the reliability of results obtained with a designated analysis program. Computer simulation has been the most frequently used procedure to verify the adequacy of a model. However, datasets generated by simulations depend on the parameterization and the assumptions of the selected model. Moreover, such datasets may constitute a partial representation of reality as the complexity or RNA-seq data is hard to mimic.

Researchers at Michigan State University present the use of plasmode datasets to complement the evaluation of statistical models for RNA-seq data. A plasmode is a dataset obtained from experimental data but for which come truth is known. Using a set of simulated scenarios of technical and biological replicates, and public available datasets, they illustrate how to design algorithms to construct plasmodes under different experimental conditions.

They contrast results from two types of methods for RNA-seq:

  1. models based on negative binomial distribution (edgeR and DESeq), and
  2. Gaussian models applied after transformation of data (MAANOVA).



Results emphasize the fact that deciding what method to use may be experiment-specific due to the unknown distributions of expression levels. Plasmodes may contribute to choose which method to apply by using a similar pre-existing dataset. The promising results obtained from this approach, emphasize the need of promoting and improving systematic data sharing across the research community to facilitate plasmode building.

  • Reeb PD, Steibel JP. (2013) Evaluating statistical analysis models for RNA sequencing experiments. Front Genet 4, 178. [article]