In the past 5 years, RNA-Seq approaches, based on high-throughput sequencing technologies, are becoming an essential tool in transcriptomics studies. It is now commonly accepted that a normalization preprocessing step can significantly improve the quality of the analysis, in particular, for the differential gene expression analysis. Nevertheless, a gold standard normalization method has not yet been found.
Université de Toulouse researchers compared two widely used and very important normalization methods and a third method related to these. The first method is the “Trimmed Mean of M-values” normalization (TMM) described in and implemented in the edgeR package. The second method is the “Relative Log Expression” normalization (RLE) implemented in the DESeq2 package. The third method is the “Median Ratio Normalization” (MRN). It has been shown that TMM and RLE give similar results both with real and simulated data sets. These two methods, as does MRN, deal efficiently with the intrinsic bias resulting from the relative size of studied transcriptomes. Also, it has even been shown that the MRN method performs slightly better on some simulated data sets.
All theoretical results are illustrated by in silico calculations carried out on a given real data set from the tomato fruit set. In short, this data set consists of a matrix of counts: 34675 rows (genes) and 9 columns (samples from 3 stages and 3 biological replicates per stage). It is evident that the three methods (with default settings) do not give the same results. Indeed, it is known that TMM normalization factors do not take into account library sizes. On the contrary, RLE and MRN factors are closer to each other, and share a positive correlation with the library size. The estimation of the regression parameters of regression lines above shows that the TMM slope is not statistically significant which is the case of both LRE and MRN slopes.
Default normalization factors for the fruit set RNA-Seq data
For a very simple experimental design, i.e., about two conditions and no replicates, users can use any of the three studied normalization methods with no impact on results. But, for a more complex experimental design, the MRN method could be adopted.