Next generation sequencing technologies are powerful new tools for investigating a wide range of biological and medical questions. Statistical and computational methods are key to analyzing massive and complex sequencing data. In order to derive gene expression measures and compare these measures across samples or libraries, we first need to normalize read counts to adjust for varying sample sequencing depths and other potentially technical effects.
In this paper, researchers from the University of Illinois at Urbana-Champaign develop a normalization method based on iterating median of M-values (IMM) for detecting the differentially expressed (DE) genes. Compared to a previous approach TMM, the IMM method improves the accuracy of DE detection. Simulation studies show that the IMM method outperforms other methods for the sample normalization. We also look into the real data and find that the genes detected by IMM but not by TMM are much more accurate than the genes detected by TMM but not by IMM. What’s more, we discovered that gene UNC5C is highly associated with kidney cancer and so on.