Single-cell RNA sequencing (scRNA-seq) enables the quantification of each gene’s expression distribution across cells, thus allowing the assessment of the dispersion, nonzero fraction, and other aspects of its distribution beyond the mean. These statistical characterizations of the gene expression distribution are critical for understanding expression variation and for selecting marker genes for population heterogeneity. However, scRNA-seq data are noisy, with each cell typically sequenced at low coverage, thus making it difficult to infer properties of the gene expression distribution from raw counts. Based on a reexamination of nine public datasets, researchers atpropose a simple technical noise model for scRNA-seq data with unique molecular identifiers (UMI). They develop deconvolution of single-cell expression distribution (DESCEND), a method that deconvolves the true cross-cell gene expression distribution from observed scRNA-seq counts, leading to improved estimates of properties of the distribution such as dispersion and nonzero fraction. DESCEND can adjust for cell-level covariates such as cell size, cell cycle, and batch effects. DESCEND’s noise model and estimation accuracy are further evaluated through comparisons to RNA FISH data, through data splitting and simulations and through its effectiveness in removing known batch effects. The researchers demonstrate how DESCEND can clarify and improve downstream analyses such as finding differentially expressed genes, identifying cell types, and selecting differentiation markers.
Illustration of the framework
(A and B) The cross-cell distribution of observed counts Ycg (B) is assumed to be a convolution of the distribution of true gene expression (A) and technical noise. (C) For each gene, the output of DESCEND includes the distribution of the absolute expression levels when spike-ins are available, the distribution of relative expression with library size normalization, the distribution of covariates-adjusted expression level if covariates are presented, estimates of the bursting and dispersion parameters, differential testing results comparing the change between two cell populations, and the effects of observed covariates on gene expression.
Availability – The R package for DESCEND is at: https://github.com/jingshuw/descend