Methods for cell clustering and gene expression from single-cell RNA sequencing (scRNA-seq) data are essential for biological interpretation of cell processes. Researchers at the University of Queensland have developed TRIAGE-Cluster which uses genome-wide epigenetic data from diverse bio-samples to identify genes demarcating cell diversity in scRNA-seq data. By integrating patterns of repressive chromatin deposited across diverse cell types with weighted density estimation, TRIAGE-Cluster determines cell type clusters in a 2D UMAP space. The researchers have also developed TRIAGE-ParseR, a machine learning method which evaluates gene expression rank lists to define gene groups governing the identity and function of cell types. They demonstrate the utility of this two-step approach using atlases of in vivo and in vitro cell diversification and organogenesis. They also provide a web accessible dashboard for analysis and download of data and software (http://cellfateexplorer.d24h.hk/). Collectively, genome-wide epigenetic repression provides a versatile strategy to define cell diversity and study gene regulation of scRNA-seq data.
Overview of unsupervised pipeline for analysis of scRNA-seq data to identify cell types
(A) TRIAGE calculates a repressive tendency score (RTS) for every gene based on its association with broad H3K27me3 domains across 834 EpiMap bio-samples (59). RTS genes above the inflection point of the RTS curve are defined as RTS priority genes to assist in peak identification using TRIAGE-Cluster. (B) Input single cell expression matrix is transformed to discordance matrix to convert the original expression value to discordance score (DS). The DS results in high ranking of cell type regulatory genes. (C) We use RTS priority genes in a density-based clustering method, TRIAGE-Cluster, to identify cell populations in UMAP space. (D) For each peak, genes are ranked by pseudo-bulk discordance score (DS). TRIAGE-ParseR analysis uses PCA and Gaussian mixture model (GMM) to group genes into functional groups to assist with cell type identification.
Availability – Code are available on Zenodo for TRIAGE-Cluster (https://doi.org/10.5281/zenodo.7816427) and TRIAGE-ParseR (https://doi.org/10.5281/zenodo.7816635).