Co-expression of mRNAs under multiple conditions is commonly used to infer co-functionality of their gene products despite well-known limitations of this “guilt-by-association” (GBA) approach. Recent advancements in mass spectrometry-based proteomic technologies have enabled global expression profiling at the protein level; however, whether proteome profiling data can outperform transcriptome profiling data for co-expression based gene function prediction has not been systematically investigated.
A team led by scientists at the Vanderbilt University School of Medicine have now addressed this question by constructing and analyzing mRNA and protein co-expression networks for three cancer types with matched mRNA and protein profiling data from The Cancer Genome Atlas (TCGA) and the Clinical Proteomic Tumor Analysis Consortium (CPTAC). Their analyses revealed a marked difference in wiring between the mRNA and protein co-expression networks. Whereas protein co-expression was driven primarily by functional similarity between co-expressed genes, mRNA co-expression was driven by both co-function and chromosomal co-localization of the genes. Functionally coherent mRNA modules were more likely to have their edges preserved in corresponding protein networks than functionally incoherent mRNA modules. Proteomic data strengthened the link between gene expression and function for at least 75% of Gene Ontology (GO) biological processes and 90% of KEGG pathways. A web application Gene2Net (http://cptac.gene2net.org) developed based on the three protein co-expression networks revealed novel gene-function relationships, such as linking ERBB2 (HER2) to lipid biosynthetic process in breast cancer, identifying PLG as a new gene involved in complement activation, and identifying AEBP1 as a new epithelial-mesenchymal transition (EMT) marker. These results demonstrate that proteome profiling outperforms transcriptome profiling for co-expression based gene function prediction. Proteomics should be integrated if not preferred in gene function and human disease studies.
Edge level comparison between mRNA and protein
co-expression networks of the three cancer types
a, Edge overlap between mRNA co-expression network (blue) and protein co-expression network (red). b, The likelihood ratios (LRs) calculated for individual networks with gold-standard reference data sets derived from GO biological process (BP), cellular component (CC) and molecular function (MF) annotations, respectively. Blue, light blue, red, light red and green bars represent mRNA co-expression network, mRNA random network, protein co-expression network, protein random network, and protein-protein interaction (PPI) network, respectively. c, The LRs of mRNA specific edges (blue), protein specific edges (red), and common edges (magenta).