Recent comprehensive assessments of RNA-seq technology support its utility in quantifying gene expression in various samples. The next step of rigorously quantifying differences between sample groups, however, still lacks well-defined best practices. Although a number of advanced statistical methods have been developed, several studies demonstrate that their performance depends strongly on the data under analysis, which compromises practical utility in real biomedical studies.
As a solution, researchers from the University of Turku propose to use a data-adaptive procedure that selects an optimal statistic capable of maximizing reproducibility of detections. After demonstrating its improved sensitivity and specificity in a controlled spike-in study, the utility of the procedure is confirmed in a real biomedical study by identifying prognostic markers for clear cell renal cell carcinoma (ccRCC).
Biological insights from ccRCC prognostic markers. (A) Functional groups of the differentially expressed genes detected by ROTS. (B) Examples of the detected differentially expressed genes. The boxes show the median and the interquartile range (IQR) of the expression levels of the poor and better prognosis patients in the TCGA and validation data, the whiskers indicate their range and the points correspond to extreme observations with values greater than 1.5 times the IQR. The boxplots for all the differentially expressed genes are shown in Supplementary Figures S2 and S3. (C) Venn diagram summarizing the overlap between the prognostic genes reported by the previous studies (18,36) and ROTS (see Supplementary Table S2 for detailed information).
Availability – An R-package implementing ROTS is available at http://www.btk.fi/research/research-groups/elo/software/rots/.