Model based normalization improves differential expression calling in low-depth RNA-seq

RNA-seq is a powerful tool for gene expression profiling and differential expression analysis. Its power depends on sequencing depth which limits its high-throughput potential, with 10-15 million reads considered as optimal balance between quality of differential expression calling and cost per sample. Researchers at Washington University in St. Louis observed, however, that some statistical features of the data, e.g. gene count distribution, are preserved well below 10-15M reads, and found that they improve differential expression analysis at low sequencing depths when distribution statistics is estimated by pooling individual samples to a combined higher-depth library. Using a novel gene-by-gene scaling technique, based on the fact that gene counts obey Pareto-like distribution, the researchers re-normalized samples towards bigger sequencing depth and show that this leads to significant improvement in differential expression calling, with only a marginal increase in false positive calls. This makes differential expression calling from 3-4M reads comparable to 10-15M reads, improving high-throughput of RNA-sequencing 3-4 fold.


Venn diagrams of intersections between verification gene set, DE genes obtained from 2 vs 2 replicates comparison of not normalized samples and the ones normalized with the pooling approach.

Zakharov P, Sergushichev A, Predeus A, Artyomov M. (2015) Model based normalization improves differential expression calling in low-depth RNA-seq. bioRxiv [Epub ahead of print]. [abstract]

Leave a Reply

Your email address will not be published. Required fields are marked *


Time limit is exhausted. Please reload CAPTCHA.