scIMC – a platform for benchmarking comparison and visualization analysis of scRNA-seq data imputation methods

With the advent of single-cell RNA sequencing (scRNA-seq), one major challenging is the so-called ‘dropout’ events that distort gene expression and remarkably influence downstream analysis in single-cell transcriptome. To address this issue, much effort has been done and several scRNA-seq imputation methods were developed with two categories: model-based and deep learning-based. However, comprehensively and systematically comparing existing methods are still lacking. In this work, Tianjin University researchers use six simulated and two real scRNA-seq datasets to comprehensively evaluate and compare a total of 12 available imputation methods from the following four aspects: (i) gene expression recovering, (ii) cell clustering, (iii) gene differential expression, and (iv) cellular trajectory reconstruction. The researchers demonstrate that deep learning-based approaches generally exhibit better overall performance than model-based approaches under major benchmarking comparison, indicating the power of deep learning for imputation. Importantly, they built scIMC (single-cell Imputation Methods Comparison platform), the first online platform that integrates all available state-of-the-art imputation methods for benchmarking comparison and visualization analysis, which is expected to be a convenient and useful tool for researchers of interest.

The benchmarking workflow of imputation methods

The benchmarking workflow of imputation methods. (A) Data Preprocessing. All datasets are filtered out by removing genes that are expressed in less than two cells, which are called low-expressed genes. We normalize the dataset by a normalization method ‘scanpy. pp. normalize_total’ from Scanpy (1.4.4) with all parameters are default. Next the normalized matrix is log-transformed. (B) Missing value Imputation. The methods for imputation are mainly divided into two categories: (i) model-based methods; (ii) deep learning-based methods. (C) Downstream Comparison Analysis. The imputed matrix is used for downstream analysis, such as clustering, differential expression analysis, etc.

(A) Data Preprocessing. All datasets are filtered out by removing genes that are expressed in less than two cells, which are called low-expressed genes. We normalize the dataset by a normalization method ‘scanpy. pp. normalize_total’ from Scanpy (1.4.4) with all parameters are default. Next the normalized matrix is log-transformed. (B) Missing value Imputation. The methods for imputation are mainly divided into two categories: (i) model-based methods; (ii) deep learning-based methods. (C) Downstream Comparison Analysis. The imputed matrix is used for downstream analysis, such as clustering, differential expression analysis, etc.

Availability– scIMC is freely accessible via https://server.wei-group.net/scIMC/.

Dai C, Jiang Y, Yin C, Su R, Zeng X, Zou Q, Nakai K, Wei L. (2022) scIMC: a platform for benchmarking comparison and visualization analysis of scRNA-seq data imputation methods. Nucleic Acids Res 50(9):4877-4899. [article]

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Time limit is exhausted. Please reload CAPTCHA.