A component overlapping attribute clustering (COAC) algorithm for single-cell RNA sequencing data analysis and potential pathobiological implications

Recent advances in next-generation sequencing and computational technologies have enabled routine analysis of large-scale single-cell ribonucleic acid sequencing (scRNA-seq) data. However, scRNA-seq technologies have suffered from several technical challenges, including low mean expression levels in most genes and higher frequencies of missing data than bulk population sequencing technologies. Identifying functional gene sets and their regulatory networks that link specific cell types to human diseases and therapeutics from scRNA-seq profiles are daunting tasks.

In this study, a team led by researchers at Case Western Reserve University developed a Component Overlapping Attribute Clustering (COAC) algorithm to perform the localized (cell subpopulation) gene co-expression network analysis from large-scale scRNA-seq profiles. Gene subnetworks that represent specific gene co-expression patterns are inferred from the components of a decomposed matrix of scRNA-seq profiles. We showed that single-cell gene subnetworks identified by COAC from multiple time points within cell phases can be used for cell type identification with high accuracy (83%). In addition, COAC-inferred subnetworks from melanoma patients’ scRNA-seq profiles are highly correlated with survival rate from The Cancer Genome Atlas (TCGA). Moreover, the localized gene subnetworks identified by COAC from individual patients’ scRNA-seq data can be used as pharmacogenomics biomarkers to predict drug responses (The area under the receiver operating characteristic curves ranges from 0.728 to 0.783) in cancer cell lines from the Genomics of Drug Sensitivity in Cancer (GDSC) database. In summary, COAC offers a powerful tool to identify potential network-based diagnostic and pharmacogenomics biomarkers from large-scale scRNA-seq profiles.

Diagram illustrating a Components Overlapping Attribute Clustering (COAC) algorithm for inferring gene-gene relationships from scRNA-seq data


(A) The whole gene co-expression network is decomposed into gene clusters (subnetworks). Each subnetwork is used to evaluate which degree of genes in the co-expression matrix derived from scRNA-seq data. If several genes express abnormally, the value of the subnetwork which contains those genes will change significantly. (B) The scRNA-seq data was decomposed into individual gene expression profile with specific components. After gene selection from each gene expression profile, the largest connected component was obtained as the subnetwork

Availability – COAC is freely available at https://github.com/ChengF-Lab/COAC.

Peng H, Zeng X, Zhou Y, Zhang D, Nussinov R, Cheng F (2019) A component overlapping attribute clustering (COAC) algorithm for single-cell RNA sequencing data analysis and potential pathobiological implications. PLoS Comput Biol 15(2): e1006772. [article]

Leave a Reply

Your email address will not be published. Required fields are marked *


Time limit is exhausted. Please reload CAPTCHA.