Single-cell RNA-sequencing (scRNA-seq) is fast becoming a powerful technique for studying dynamic gene regulation at unprecedented resolution. However, scRNA-seq data suffer from problems of extremely high dropout rate and cell-to-cell variability, demanding new methods to recover gene expression loss. Despite the availability of various dropout imputation approaches for scRNA-seq, most studies focus on data with a medium or large number of cells, while few studies have explicitly investigated the differential performance across different sample sizes or the applicability of the approach on small or imbalanced data. It is imperative to develop new imputation approaches with higher generalizability for data with various sample sizes.
Xiamen University researchers have developed a method called scHinter for imputing dropout events for scRNA-seq with special emphasis on data with limited sample size. scHinter incorporates a voting-based ensemble distance and leverages the synthetic minority over-sampling technique for random interpolation. A hierarchical framework is also embedded in scHinter to increase the reliability of the imputation for small samples. The researchers demonstrated the ability of scHinter to recover gene expression measurements across a wide spectrum of scRNA-seq datasets with varied sample sizes. They comprehensively examined the impact of sample size and cluster number on imputation. Comprehensive evaluation of scHinter across diverse scRNA-seq datasets with imbalanced or limited sample size showed that scHinter achieved higher and more robust performance than competing approaches, including MAGIC, scImpute, SAVER, and netSmooth.
Schematic diagram of the scHinter framework
(a) The voting-based ensemble distance assembles several widely used distances into one ensemble solution. (b) The random interpolation adopts the SMOTE (Synthetic Minority Over-sampling Technique) oversampling strategy and assigns random weights to cells according to the ensemble distance. (c) The hierarchical framework was performed in a multi-layer manner, which gradually adds more cells for random interpolation and employs the probability density curves of geometric distribution with different mathematical expectations to weight cells under each layer.
Availability – Freely available for download at https://github.com/BMILAB/scHinter.