Single cell RNA-sequencing (scRNA-seq) technology has undergone rapid development in recent years, leading to an explosion in the number of tailored data analysis methods. However, the current lack of gold-standard benchmark datasets makes it difficult for researchers to systematically compare the performance of the many methods available.
Researchers from The Walter and Eliza Hall Institute of Medical Research generated a realistic benchmark experiment that included single cells and admixtures of cells or RNA to create ‘pseudo cells’ from up to five distinct cancer cell lines. In total, 14 datasets were generated using both droplet and plate-based scRNA-seq protocols. They compared 3,913 combinations of data analysis methods for tasks ranging from normalization and imputation to clustering, trajectory analysis and data integration. Evaluation revealed pipelines suited to different types of data for different tasks. These data and analysis provide a comprehensive framework for benchmarking most common scRNA-seq analysis steps.
Overview of the scRNA-seq mixology experimental design and benchmark analysis
a,b, The benchmark experimental design involving single cells (a) and ‘pseudo cells’ (b). c, PCA plots from representative datasets for each design (normalized using scran) highlight the structure present in each experiment. The percentage of variation explained by each principal component (PC) is included in the respective axis labels, and sample sizes are indicated by n. d, Workflow for benchmarking different analysis tasks using the CellBench R package.
Availability – The CellBench R package was developed for benchmarking single cell analysis methods and is available from https://github.com/Shians/CellBench and Bioconductor (https://www.bioconductor.org/packages/CellBench).