Determining differentially expressed genes (DEGs) between biological samples is the key to understand how genotype gives rise to phenotype. RNA-seq and microarray are two main technologies for profiling gene expression levels. However, considerable discrepancy has been found between DEGs detected using the two technologies. Integration data across these two platforms has the potential to improve the power and reliability of DEG detection.
Researchers at Penn State University have developed a rank-based semi-parametric model to determine DEGs using information across different sources and apply it to the integration of RNA-seq and microarray data. By incorporating both the significance of differential expression and the consistency across platforms, our method effectively detects DEGs with moderate but consistent signals. The researchers demonstrate the effectiveness of thier method using simulation studies, MAQC/SEQC data and a synthetic microRNA dataset.
Comparison of discriminative power in the real data-based simulation study
Figures show the percentage of correct and incorrect calls at various thresholds for six simulation settings. a. 20 % genes are up-regulated and 20 % genes are down-regulated. Both platforms have high data quality. b. 20 % genes are up-regulated and 20 % genes are down-regulated. One platform has high data quality and the other platform has low data quality. c. 20 % genes are up-regulated and 20 % genes are down-regulated. Both platforms have low data quality. d. 10 % genes are up-regulated and 30 % genes are down-regulated. Both platforms have high data quality. e. 10 % genes are up-regulated and 30 % genes are down-regulated. One platform has high data quality and the other platform has low data quality. f. 10 % genes are up-regulated and 30 % genes are down-regulated. Both platforms have low data quality
This integration method is not only robust to noise and heterogeneity in the data, but also adaptive to the structure of data. In simulations and real data studies, this approach shows a higher discriminate power and identifies more biologically relevant DEGs than eBayes, DEseq and some commonly used meta-analysis methods.