The rapid development of single-cell RNA-sequencing (scRNA-seq) technology, with increased sparsity compared to bulk RNA-sequencing (RNA-seq), has led to the emergence of many methods for preprocessing, including imputation methods. Researchers from the Johns Hopkins Bloomberg School of Public Health systematically evaluate the performance of 18 state-of-the-art scRNA-seq imputation methods using cell line and tissue data measured across experimental protocols. Specifically, they assess the similarity of imputed cell profiles to bulk samples as well as investigate whether methods recover relevant biological signals or introduce spurious noise in three downstream analyses: differential expression, unsupervised clustering, and inferring pseudotemporal trajectories. Broadly, the researchers found significant variability in the performance of the methods across evaluation settings. While most scRNA-seq imputation methods recover biological expression observed in bulk RNA-seq data, the majority of the methods do not improve performance in downstream analyses compared to no imputation, in particular for clustering and trajectory analysis, and thus should be used with caution. Furthermore, they found that the performance of scRNA-seq imputation methods depends on many factors including the experimental protocol, the sparsity of the data, the number of cells in the dataset, and the magnitude of the effect sizes. The researchers provide a key set of recommendations for users and investigators to navigate the current space of scRNA-seq imputation methods.
Motivation and overview of benchmark evaluation of scRNA-seq imputation methods
(A) Dimension reduction results after applying Principal Components Analysis (PCA) from either no imputation method (no_imp highlighted in red) or the 18 imputation methods using the null simulations data in which no structural pattern is expected. The color represents the simulated library size (defined as the total sum of counts across all relevant features) for each cell. (B) An overview of the benchmark comparison evaluating 18 scRNA-seq imputation methods.