Detecting Multivariate Gene Interactions in RNA-Seq Data Using Optimal Bayesian Classification

RNA-Seq is a high-throughput technique for measuring the gene expression profile of a target tissue or even single cells. Due to its increased accuracy and flexibility over microarray technologies, it is widely applied in biological fields to uncover the transcriptional mechanisms at play in a given physiology or phenotype.

Typically, this analysis involves mapping the RNASeq reads to a reference genome, quantifying transcript expression, and then performing testing for differential gene expression to determine which genes are expressed at significantly different levels in the phenotypes being compared. Tools such as Cufflinks, edgeR, and DESeq2 provide these univariate statistical tests using well characterized univariate statistical models of gene expression.

However, one is often interested in phenotypes which can only be discriminated by the state of several genes simultaneously. These multivariate relationships cannot be detected using univariate testing procedures only. Instead, it is necessary to consider the joint expression patterns between multiple genes simultaneously and the ability to use this joint expression to discriminate the phenotypes of interest. Many biological phenomena induce strong correlations among genes or exhibit phenotypes which alter this correlation including canalizing genes, genetic mutations in cancer, and nonlinear saturation effects of gene expression.

To approach this problem, researchers at Texas A&M University utilize the theory of statistical classification for two primary reasons (read more…)

Knight J, Ivanov I, Triff K, Chapkin R, Dougherty E. (2015) Detecting Multivariate Gene Interactions in RNA-Seq Data Using Optimal Bayesian Classification. IEEE/ACM Trans Comput Biol Bioinform [Epub ahead of print]. [abstract]

Leave a Reply

Your email address will not be published. Required fields are marked *


Time limit is exhausted. Please reload CAPTCHA.