As gene expression measurement technology is shifting from microarrays to sequencing, the statistical tools available for their analysis must be adapted since RNA-seq data are measured as counts. It has been proposed to model RNA-seq counts as continuous variables using nonparametric regression to account for their inherent heteroscedasticity.
In this vein, researchers from Harvard Medical School and propose tcgsaseq, a principled, model-free, and efficient method for detecting longitudinal changes in RNA-seq gene sets defined a priori. The method identifies those gene sets whose expression varies over time, based on an original variance component score test accounting for both covariates and heteroscedasticity without assuming any specific parametric distribution for the (transformed) counts. The researchers demonstrate that despite the presence of a nonparametric component, our test statistic has a simple form and limiting distribution, and both may be computed quickly. A permutation version of the test is additionally proposed for very small sample sizes. Applied to both simulated data and two real datasets, tcgsaseq is shown to exhibit very good statistical properties, with an increase in stability and power when compared to state-of-the-art methods ROAST (rotation gene set testing), edgeR, and DESeq2, which can fail to control the type I error under certain realistic settings.
Power evaluation in synthetic data comparing tcgsaseq, ROAST, edgeR-ROAST, and DESeq2-min test, based on 1000 simulations.
Availability – The method is available for the community in the R package tcgsaseq: https://cran.r-project.org/web/packages/tcgsaseq/index.html