In current statistical methods for calling differentially expressed genes in RNA-Seq experiments, the assumption is that an adjusted observed gene count represents an unknown true gene count. This adjustment usually consists of a normalization step to account for heterogeneous sample library sizes, and then the resulting normalized gene counts are used as input for parametric or non-parametric differential gene expression tests. A distribution of true gene counts, each with a different probability, can result in the same observed gene count. Importantly, sequencing coverage information is currently not explicitly incorporated into any of the statistical models used for RNA-Seq analysis.
Researchers at the University of Malaya have developed a fast Bayesian method which uses the sequencing coverage information determined from the concentration of an RNA sample to estimate the posterior distribution of a true gene count. This method has better or comparable performance compared to NOISeq and GFOLD, according to the results from simulations and experiments with real unreplicated data. The researchers incorporated a previously unused sequencing coverage parameter into a procedure for differential gene expression analysis with RNA-Seq data.
Illustration of how DEG calls are made in CORNAS. By default, the fold-change (ϕ) is 1.5. a A DEG, b Not a DEG, c Not a DEG
Availability – The method is implemented in CORNAS (Coverage-dependent RNA-Seq), and is available at https://github.com/joel-lzb/CORNAS .