A common bioinformatics task in single-cell data analysis is to purify a cell type or cell population of interest from heterogeneous datasets. Researchers at the University of Lausanne have developed scGate, an algorithm that automatizes marker-based purification of specific cell populations, without requiring training data or reference gene expression profiles. scGate purifies a cell population of interest using a set of markers organized in a hierarchical structure, akin to gating strategies employed in flow cytometry. scGate outperforms state-of-the-art single-cell classifiers and it can be applied to multiple modalities of single-cell data (e.g. RNA-seq, ATAC-seq, CITE-seq). scGate is implemented as an R package and integrated with the Seurat framework, providing an intuitive tool to isolate cell populations of interest from heterogeneous single-cell datasets.
Purifying cell populations from single-cell datasets using scGate
A) Uniform Manifold Approximation and Projection (UMAP) representation of scRNA-seq data of peripheral blood mononuclear cell (PBMC) populations annotated by Hao et al. B) Purification of target cell types using scGate, for B cells on the left (using marker MS4A1 [encoding CD20]) and natural killers (NK) on the right (using NCAM [encoding CD56] and KLRD1 as positive markers, and CD3D as a negative marker). The violin plots display normalized ADT counts for the indicated proteins on the same cells. Precision (PREC), recall (REC) and Matthews Correlation Coefficient (MCC) are shown. C) UMAP representation of scRNA-seq data of melanoma tumors annotated by Jerby-Arnon et al., (2018) D) Purification of macrophages using a hierarchical gating model: immune cells at the first level (left panel) and macrophages at the second level (middle panel). Macrophage gene signature (UCell) scores are shown in the right panel.
Availability – R package source code and reproducible tutorials are available at https://github.com/carmonalab/scGate