Testing for association between RNA-Seq and high-dimensional data

Genetic and epigenetic factors contribute to the regulation of gene expression. From a statistical perspective, it makes sense to represent the expression of one gene as a response variable that changes when some covariates are altered. As a starting point, we assume that all covariates come from a single genetic or epigenetic molecular profile. Typically, more covariates are of interest than there are samples.

A plethora of methods for the analysis of gene expression and covariates has emerged in the last years. Many of these methods test each covariate individually, and subsequently correct for multiple testing or rank the covariates by significance. An alternative approach is the global test which does not test the individual but the joint significance of covariates. It allows for high dimensionality, reduces the multiple testing burden, and successfully detects small effects that encompass many covariates. Due to its desirable properties, the global test has become a widely used tool in genomics.

Currently, gene expression microarrays are being supplanted by high-throughput sequencing. The negative binomial distribution seems to be a sensible choice for modelling RNA sequencing data. One of its parameters describes the dispersion of the variable. If this parameter is unknown, the negative binomial distribution is not in the exponential family. As the global test is limited in its current form to the exponential family of distributions, a new test is needed for RNA-Seq data.

Using the negative binomial distribution and a random-effects model, researchers from VU University Medical Center, The Netherlands have developed an omnibus test that overcomes both difficulties. It may be conceptualised as a test of overall significance in regression analysis, where the response variable is overdispersed and the number of explanatory variables exceeds the sample size.

The proposed test can detect genetic and epigenetic alterations that affect gene expression. It can examine complex regulatory mechanisms of gene expression.


Empirical cumulative distribution functions and scatterplots of p-values. We test for associations between RNA-Seq on one hand, and either copy numbers, methylations or both on the other. The corresponding Spearman correlation coefficients are 0.04 (top right), 0.55 (bottom left) and 0.72 (bottom right)

Availability – The R package globalSeq is available from Bioconductor. http://bioconductor.org/packages/globalSeq/

Rauschenberger A, Jonker MA, van de Wiel MA, Menezes RX. (2016) Testing for association between RNA-Seq and high-dimensional data. BMC Bioinformatics 17(1):118. [article]

Leave a Reply

Your email address will not be published. Required fields are marked *


Time limit is exhausted. Please reload CAPTCHA.