scRNA-seq profiles each represent a highly partial sample of mRNA molecules from a unique cell that can never be resampled, and robust analysis must separate the sampling effect from biological variance. Researchers at the Weizmann Institute of Science describe a methodology for partitioning scRNA-seq datasets into metacells: disjoint and homogenous groups of profiles that could have been resampled from the same cell. Unlike clustering analysis, our algorithm specializes at obtaining granular as opposed to maximal groups. They show how to use metacells as building blocks for complex quantitative transcriptional maps while avoiding data smoothing. Our algorithms are implemented in the MetaCell R/C++ software package.
Metacell analysis of the PBMC 8K dataset
a Schematics of the MC algorithmic pipeline. b Outlier/rare cells matrix showing color-coded number of UMIs per cells (columns) for which at least one gene (rows) was shown to be expressed significantly beyond its MC expected number of UMIs. Outlier/rare cells are ordered according to the annotation of the MC containing them (bottom color-coded bars). c Shown are log-fold-enrichment (lfp, methods) values for metacells, color-coded according to initial cell type annotation, comparing the T cell marker (CD3D) to a B cell (CD79A) and myeloid (LYZ) markers. d Heat map shows enrichment values for metacells (columns) and their maximally enriched gene markers. e Shown is the MC adjacency graph (numbered nodes connected by edges), color-coded according to their cell type and transcriptional state annotation. Cells are shown as small color-coded points localized according to the coordinates of MCs adjacent to them.
Availability – MetaCell’s open-source code is maintained and documented on GitHub: https://tanaylab.github.io/metacell/