Large-scale single-cell transcriptomic datasets generated using different technologies contain batch-specific systematic variations that present a challenge to batch-effect removal and data integration. With continued growth expected in scRNA-seq data, achieving effective batch integration with available computational resources is crucial. Here, researchers from A*STAR perform an in-depth benchmark study on available batch correction methods to determine the most suitable method for batch-effect removal.
These researchers compare 14 methods in terms of computational runtime, the ability to handle large datasets, and batch-effect correction efficacy while preserving cell type purity. Five scenarios are designed for the study: identical cell types with different technologies, non-identical cell types, multiple batches, big data, and simulated data. Performance is evaluated using four benchmarking metrics including kBET, LISI, ASW, and ARI. We also investigate the use of batch-corrected data to study differential gene expression.
Benchmarking 14 methods on ten datasets using five evaluation metrics
a Benchmarking workflow. The performance of 14 batch correcting algorithms were evaluated in terms of their ability to integrate batches while maintaining accuracy in terms of cell type separation. The researchers employed t-SNE and UMAP visualizations in conjunction with the kBET, LISI, ASW, ARI, and DEG benchmarking metrics to evaluate the batch correction results. b Description of the ten datasets on which the batch correction algorithms were tested
Based on these results, Harmony, LIGER, and Seurat 3 are the recommended methods for batch integration. Due to its significantly shorter runtime, Harmony is recommended as the first method to try, with the other methods as viable alternatives