A new normalization strategy for RNA-Seq datasets

Recent studies have demonstrated that the normalization step for RNA-seq data is critical for a more accurate subsequent analysis of differential gene expression. Development of a more robust normalization method is desirable for identifying the true difference in tag count data.

The key concept of this new strategy for normalizing tag count data is to remove data assigned as potential differentially expressed genes (DEGs) before calculating the normalization factor. Several R packages for identifying DEGs are currently available, and each package uses its own normalization method and gene ranking algorithm.

The new normalization strategy was compared with the default normalization settings of four R packages (edgeR, DESeq, baySeq, and NBPSeq). Many synthetic datasets under various scenarios were evaluated on the basis of the area under the curve (AUC) as a measure for both sensitivity and specificity. Results showed that the elimination of potential DEGs is essential for more accurate normalization of RNA-seq data. The concept of this normalization strategy can widely be applied to other types of tag count data and to microarray data.

  • Kadota K, Nishiyama T, Shimizu K. (2012) A normalization strategy for comparing tag count data. Algorithms Mol Biol 7(1), 5. [article]