Breast cancer prognosis is challenging due to the heterogeneity of the disease. Various computational methods using bulk RNA-seq data have been proposed for breast cancer prognosis. However, these methods suffer from limited performances or ambiguous biological relevance, as a result of the neglect of intra-tumor heterogeneity. Recently, single cell RNA-sequencing (scRNA-seq) has emerged for studying tumor heterogeneity at cellular levels.
University of South Australia researchers propose a novel method, scPrognosis, to improve breast cancer prognosis with scRNA-seq data. scPrognosis uses the scRNA-seq data of the biological process Epithelial-to-Mesenchymal Transition (EMT). It firstly infers the EMT pseudotime and a dynamic gene co-expression network, then uses an integrative model to select genes important in EMT based on their expression variation and differentiation in different stages of EMT, and their roles in the dynamic gene co-expression network. To validate and apply the selected signatures to breast cancer prognosis, the researchers use them as the features to build a prediction model with bulk RNA-seq data. The experimental results show that scPrognosis outperforms other benchmark breast cancer prognosis methods that use bulk RNA-seq data. Moreover, the dynamic changes in the expression of the selected signature genes in EMT may provide clues to the link between EMT and clinical outcomes of breast cancer. scPrognosis will also be useful when applied to scRNA-seq datasets of different biological processes other than EMT.
Workflow of the proposed scPrognosis framework
There are five main steps in scPrognosis, including: ① Pre-processing scRNA-seq data; ② Inferring EMT pseudotime, pseudotime series gene expression data, and dynamic gene co-expression network from the filtered scRNA-seq data; ③ Ranking genes by three measurements; ④ Prioritizing genes via an integrative model; ⑤ Cancer prognosis using the top N ranked genes. The first four steps are based on scRNA-seq data while the last step uses bulk RNA-seq data to select parameters.