Single-cell RNA-sequencing (scRNA-seq) is becoming indispensable in the study of cell-specific transcriptomes. However, in scRNA-seq techniques, only a small fraction of the genes are captured due to “dropout” events. These dropout events require intensive treatment when analyzing scRNA-seq data. For example, imputation tools have been proposed to estimate dropout events and de-noise data. The performance of these imputation tools are often evaluated, or fine-tuned, using various clustering criteria based on ground-truth cell subgroup labels. This limits their effectiveness in the cases where we lack cell subgroup knowledge. We consider an alternative strategy which requires the imputation to follow a “self-consistency” principle; that is, the imputation process is to refine its results until there is no internal inconsistency or dropouts from the data.
Researchers at the City University of Hong Kong propose the use of “self-consistency” as a main criteria in performing imputation. To demonstrate this principle they devised I-Impute, a “self-consistent” method, to impute scRNA-seq data. I-Impute optimizes continuous similarities and dropout probabilities, in iterative refinements until a self-consistent imputation is reached. On the in silico data sets, I-Impute exhibited the highest Pearson correlations for different dropout rates consistently compared with the state-of-art methods SAVER and scImpute. Furthermore, the researchers collected three wetlab datasets, mouse bladder cells dataset, embryonic stem cells dataset, and aortic leukocyte cells dataset, to evaluate the tools. I-Impute exhibited feasible cell subpopulation discovery efficacy on all the three datasets. It achieves the highest clustering accuracy compared with SAVER and scImpute.
Illustration of I-Impute architecture
(A) I-Impute first calls an internal subroutine (called C-Impute), which uses continuous similarities and dropout probabilities to infer missing entries. (B) Then, I-Impute invokes SAVER as a subroutine to preprocess the data. Finally, it deploys C-Impute iteratively on the processed data.