Single-cell RNA-sequencing technologies provide a powerful tool for systematic dissection of cellular heterogeneity. However, the prevalence of dropout events imposes complications during data analysis and, despite numerous efforts from the community, this challenge has yet to be solved.
Researchers at the Dana-Farber Cancer Institute have developed a computational method, called RESCUE, to mitigate the dropout problem by imputing gene expression levels using information from other cells with similar patterns. Unlike existing methods, the researchers use an ensemble-based approach to minimize the feature selection bias on imputation. By comparative analysis of simulated and real single-cell RNA-seq datasets, they show that RESCUE outperforms existing methods in terms of imputation accuracy which leads to more precise cell-type identification.
A motivation of the RESCUE imputation pipeline illustrated
with a hypothetical example of simulated data
a Heatmap of a log-transformed normalized expression matrix with cell type clustering affected by dropout. b t-SNE visualizations of cell clusters determined with the principle components of many subsamples of informative genes, and a histogram showing the bootstrap distribution of the within-cluster non-zero gene expression means for one missing expression value in the data set. c Heatmap of the expression data after imputing zero values with a summary statistic of the bootstrap distributions
Availability – RESCUE is implemented in R and available at https://github.com/seasamgo/rescue