A key challenge in single-cell RNA-sequencing (scRNA-seq) data analysis is batch effects that can obscure the biological signal of interest. Although there are various tools and methods to correct for batch effects, their performance can vary. Therefore, it is important to understand how batch effects manifest to adjust for them. Here, University of Zurich researchers systematically explore batch effects across various scRNA-seq datasets according to magnitude, cell type specificity, and complexity. They have developed a cell-specific mixing score (cms) that quantifies mixing of cells from multiple batches. By considering distance distributions, the score is able to detect local batch bias as well as differentiate between unbalanced batches and systematic differences between cells of the same cell type. The researchers compare metrics in scRNA-seq data using real and synthetic datasets and whereas these metrics target the same question and are used interchangeably, they find differences in scalability, sensitivity, and ability to handle differentially abundant cell types. The researchers find that cell-specific metrics outperform cell type–specific and global metrics and recommend them for both method benchmarks and batch exploration.
(A) Gene-wise variance partitioning across datasets. Each dot in each ternary plot represents a gene’s relative amount of variance explained (by batch, cell type or interaction). (B, C) Batch logFC distribution by cell type and batch effect in the cellbench and hca datasets, respectively. Each column represents a density plot of the estimated logFCs for a batch/cell type combination. Dotted lines indicate the mean, 25%, 50% and 75% percentiles.
Availability – The various cell-specific metrics are implemented in the CellMixS R/Bioconductor package and all code for analyses here are available at https://github.com/almutlue/batch_snakemake and https://github.com/almutlue/mixing_benchmark