Single-cell RNA-seq (scRNA-seq) profiles gene expression of individual cells. Unique molecular identifiers (UMIs) remove duplicates in read counts resulting from polymerase chain reaction, a major source of noise. For scRNA-seq data lacking UMIs, Princeton University researchers propose quasi-UMIs: quantile normalization of read counts to a compound Poisson distribution empirically derived from UMI datasets. When applied to ground-truth datasets having both reads and UMIs, quasi-UMI normalization has higher accuracy than competing methods. Using quasi-UMIs enables methods designed specifically for UMI data to be applied to non-UMI scRNA-seq datasets.
Quasi-UMI counts approximate UMI counts more closely than census counts
QUMI normalization with Poisson-lognormal target distribution was applied to read counts from three datasets. qumi_custom: shape parameter (1.9 for Macosko, 2.4 for Tung and Zheng) set by maximum likelihood fit to matched training data from same tissue type. qumi_default: shape parameter set to 2.0 for all datasets