The amounts and types of available multimodal tumor data are rapidly increasing, and their integration is critical for fully understanding the underlying cancer biology and personalizing treatment. However, the development of methods for effectively integrating multimodal data in a principled manner is lagging behind our ability to generate the data. Researchers from the NYU School of Medicine introduce an extension to a multiview nonnegative matrix factorization algorithm (NNMF) for dimensionality reduction and integration of heterogeneous data types and compare the predictive modeling performance of the method on unimodal and multimodal data. They also present a comparative evaluation of their novel multiview approach and current data integration methods. This work provides an efficient method to extend an existing dimensionality reduction method. The researchers report rigorous evaluation of the method on large-scale quantitative protein and phosphoprotein tumor data from the Clinical Proteomic Tumor Analysis Consortium (CPTAC) acquired using state-of-the-art liquid chromatography mass spectrometry. Exome sequencing and RNA-Seq data were also available from The Cancer Genome Atlas for the same tumors. For unimodal data, in case of breast cancer, transcript levels were most predictive of estrogen and progesterone receptor status and copy number variation of human epidermal growth factor receptor 2 status. For ovarian and colon cancers, phosphoprotein and protein levels were most predictive of tumor grade and stage and residual tumor, respectively. When multiview NNMF was applied to multimodal data to predict outcomes, the improvement in performance is not overall statistically significant beyond unimodal data, suggesting that proteomics data may contain more predictive information regarding tumor phenotypes than transcript levels, probably due to the fact that proteins are the functional gene products and therefore a more direct measurement of the functional state of the tumor.
(A) Comparisons of unimodal best performing modality with both uniform integration and (B) Adaptive Multiview NNMF for the different tasks. Predictivity is measured by the area under receiver operating characteristic curve (AUC) performance. The results in (A) are obtained using nominal comparison of AUC differences in individual data sets/tasks using uniform integration, whereas the results in (B) are obtained using a nominal comparison of the AUC differences in individual data sets and tasks using Adaptive Multiview NNMF. NNMF indicates nonnegative matrix factorization algorithm.