The use of RNA-sequencing has garnered much attention in the recent years for characterizing and understanding various biological systems. However, it remains a major challenge to gain insights from a large number of RNA-seq experiments collectively, due to the normalization problem. Current normalization methods are based on assumptions that fail to hold when RNA-seq profiles become more abundant and heterogeneous. University of Utah researchers present a normalization procedure that does not rely on these assumptions, or on prior knowledge about the reference transcripts in those conditions. This algorithm is based on a graph constructed from intrinsic correlations among RNA-seq transcripts and seeks to identify a set of densely connected vertices as references. Application of this algorithm on benchmark data showed that it can recover the reference transcripts with high precision, thus resulting in high-quality normalization. As demonstrated on a real data set, this algorithm gives good results and is efficient enough to be applicable to real-life data.
Comparison of graph-based normalization and TMM,
a state-of-the-art method, on benchmark data
Left– Discrepancies between TMM-normalized and graph-based normalized expression levels, measured in srms against the ground-truth normalization. Right– Distribution of srms deviation from the ground-truth normalization.