Breast cancer (BC) is increasing in incidence and resistance to treatment worldwide. The challenges in limited therapeutic options and poor survival outcomes in BC subtypes persist because of its molecular heterogeneity and resistance to standard endocrine therapy. Recently, high throughput RNA sequencing (RNA-seq) has been used to identify biomarkers of disease progression and signaling pathways that could be amenable to specific therapies according to the BC subtype. However, there is no single generally accepted pipeline for the analysis of RNA-seq data in biomarker discovery due, in part, to the needs of simultaneously satisfying constraints of sensitivity and specificity.
University of Louisville researchers proposed a combined approach using gene-wise normalization, UQ-pgQ2, followed by a Wald test from DESeq2. Their approach improved the analysis based on within-group comparisons in terms of the specificity when applied to publicly available RNA-seq BC datasets. In terms of identifying differentially expressed genes (DEGs), they combined an optimized log2 fold change cutoff with a nominal false discovery rate of 0.05 to further minimize false positives. Using this method in the analysis of two GEO BC datasets, the researchers identified 797 DEGs uniquely expressed in triple negative BC (TNBC) and significantly associated with T cell and immune-related signaling, contributing to the immunotherapeutic efficacy in TNBC patients. In contrast, they identified 1403 DEGs uniquely expressed in estrogen positive and HER2 negative BC (ER+HER2-BC) and significantly associated with eicosanoid, notching and FAK signaling while a common set of genes was associated with cellular growth and proliferation. Thus, this approach to control for false positives identified two distinct gene expression profiles associated with these two subtypes of BC which are distinguishable by their molecular and functional attributes.
Hierarchical clustering heatmaps of BC based on the DESeq-normalized gene expression levels
The genes with similar expression patterns are clustered together. The up-regulated genes are in red and the down-regulated genes are in green. (A) A heatmap based on gene expression levels of 1,693 DEGs uniquely identified in TNBC data. (B) A heatmap based on gene expression of 2,299 DEGs uniquely identified in ER+HER2–BC data.