Though transcriptomics technologies evolve rapidly in the past decades, integrative analysis of mixed data between microarray and RNA-seq remains challenging due to the inherent variability difference between them. Researchers from Tongji University have developed Rank-In to correct the nonbiological effects across the two technologies, enabling freely blended data for consolidated analysis. Rank-In was rigorously validated via the public cell and tissue samples tested by both technologies. On the two reference samples of the SEQC project, Rank-In not only perfectly classified the 44 profiles but also achieved the best accuracy of 0.9 on predicting TaqMan-validated DEGs. More importantly, on 327 Glioblastoma (GBM) profiles and 248, 523 heterogeneous colon cancer profiles respectively, only Rank-In can successfully discriminate every single cancer profile from normal controls, while the others cannot. Further on different sizes of mixed seq-array GBM profiles, Rank-In can robustly reproduce a median range of DEG overlapping from 0.74 to 0.83 among top genes, whereas the others never exceed 0.72. Being the first effective method enabling mixed data of cross-technology analysis, Rank-In welcomes hybrid of array and seq profiles for integrative study on large/small, paired/unpaired and balanced/imbalanced samples, opening possibility to reduce sampling space of clinical cancer patients.
The workflow of Rank-In
(A) The Rank-In workflow. The consolidated expression profiles from microarray and RNA-seq are transformed into internal ranking and further weighted by intensity increasing, then calculated by SVD into the adjusted ranking matrix. (B) The data distribution difference of raw profiles between microarray and RNA-seq. (C) The sorted ranking according to expression. (D) The data distribution from microarray and RNA-seq after Rank-In.
Availability – Rank-In can be accessed at http://www.badd-cao.net/rank-in/index.html.