Investigators at Rutgers Cancer Institute of New Jersey have developed a computational method that uncovers clinically relevant gene expression patterns in large cohorts of breast cancer patients. This method, which is applicable to the analysis of all cancers, can robustly describe molecular processes that are associated with tumor subtypes and can identify predictive markers of response to treatment or disease recurrence.
Rutgers Cancer Institute of New Jersey research member Hossein Khiabanian, PhD, an assistant professor of pathology and laboratory medicine at Rutgers Robert Wood Johnson Medical School is the senior author of the work. Rutgers Cancer Institute associate research member Gyan Bhanot, PhD, professor of molecular biology and biochemistry and professor of physics in the School of Arts and Sciences at Rutgers University, along with lead author Amartya Singh, MS, a Rutgers Physics Department PhD candidate, share more about the work, published in the June 18 online edition of GigaScience
Q: Why is it important to study this topic?
Changes in patterns of gene expression in tumor cells play a key role in cancer development, its progression, and therefore its treatment. Gene expression is the process of transcribing genomic information that is stored in the DNA to messenger RNA, which in turn is used for synthesizing a functioning protein. Investigating changes in gene expression may help us identify molecular biomarkers that are predictive of disease subtype and stage. High-throughput sequencing technologies, such as RNA sequencing, have enabled precise and unbiased quantification of transcription levels for thousands of genes in large cohorts of patients that include hundreds of samples. Clustering approaches that group together both genes and samples simultaneously in an unsupervised manner, known as biclustering, can not only discover genes that are co-expressed aberrantly, but also allow us to uncover associations between tumor samples with similar changes in their gene expression and clinical attributes such as survival and therapeutic response.
(A) Flow chart of the pipeline for TuBA. (B) Schematic representation of the graph-based approach to discover biclusters.
Q: Tell us about the study and its findings.
We developed a novel biclustering method called the Tunable Biclustering Algorithm (TuBA) that can be used to analyze large gene expression datasets independent of the platform used to generate the data. TuBA is based on the hypothesis that if a cellular mechanism is affected in a set of tumors, genes relevant to the mechanism should co-exhibit similar up- or down-regulation in a significant fraction of the tumors. Therefore, without assuming underlying expression distributions or relying explicitly on the actual gene expression values, TuBA can generate graphs in which the nodes correspond to the genes, and the edges correspond to the shared samples.
We applied TuBA to three large gene expression datasets encompassing a total of 3,940 breast invasive carcinoma patients. We demonstrated that there was significant agreement between the results obtained for each dataset, and discovered that about 50 percent of the altered co-expression signatures were associated with a subtype of the disease that is defined by the low expression levels of estrogen hormone receptor ESR1 (ER) and the ERBB2 (HER2) genes, (ER-negative/HER2-negative). Since only 15 percent of all BRCAs are estimated to belong to this subtype, our algorithm was able to highlight the tremendous heterogeneity in alterations in these tumors. Interestingly, more than 50 percent of these signatures were associated with alterations in the DNA that results in amplification (or deletion) of genes’ copies, which subsequently result in higher (or lower) level of gene expression. In other words, TuBA was especially effective in identifying transcriptionally active copy number variations in tumor samples. Finally, TuBA identified biclusters that were associated with the non-tumor component of the tumor microenvironment, such as infiltrating immune and stromal cells, and could highlight their role in modulating tumor progression.
Q: What is the implication of this work on breast cancer treatment and/or future research?
Molecular classification of cancers, in particular for breast tumors, has greatly improved treatment outcomes. However, traditional approaches based on clinical and pathological criteria are not refined enough to characterize the tremendous diversity within and across individual tumors. For example, the heterogeneity in altered gene expression within the ER-negative/HER2-negative subtype of breast cancers, as revealed by TuBA, far exceeds the heterogeneity present in other subtypes. Exploring the diversity of aberrant signatures would enable the identification of potential biomarkers of clinical relevance that can further improve treatment outcomes for breast as well as other cancers.
Q: What are ‘next steps’ involving this work?
We are currently applying TuBA to gene expression datasets of 24 other cancer types. Moreover, the simple and straightforward assumptions that underlie TuBA enable it to be adapted for analysis of large datasets that examine other types of alterations in tumors such as DNA methylation. Integration of these results would further enhance our understanding of the key changes in tumor cells that may be susceptible to therapeutic interventions.