Integration of single-cell RNA sequencing data between different samples has been a major challenge for analyzing cell populations. However, strategies to integrate differential expression analysis of single-cell data remain underinvestigated. Researchers from the Ulsan National Institute of Science and Technology, Korea benchmark 46 workflows for differential expression analysis of single-cell data with multiple batches. The researchers show that batch effects, sequencing depth and data sparsity substantially impact their performances. Notably, they found that the use of batch-corrected data rarely improves the analysis for sparse data, whereas batch covariate modeling improves the analysis for substantial batch effects. They show that for low depth data, single-cell techniques based on zero-inflation model deteriorate the performance, whereas the analysis of uncorrected data using limmatrend, Wilcoxon test and fixed effects model performs well. The researchers suggest several high-performance methods under different conditions based on various simulation and real data analyses. Additionally, they demonstrate that differential expression analysis for a specific cell type outperforms that of large-scale bulk sample data in prioritizing disease-related genes.
An overview of our benchmark study for differential expression (DE)
analysis of scRNA-seq data with multiple batches
In total, 46 workflows from three integrative strategies and the naïve approach were tested.