Normalization of RNA-sequencing data is essential for accurate downstream inference, but the assumptions upon which most methods are based do not hold in the single-cell setting. Consequently, applying existing normalization methods to single-cell RNA-seq data introduces artifacts that bias downstream analyses. To address this, researchers from the University of Wisconsin–Madison introduce SCnorm for accurate and efficient normalization of scRNA-seq data. SCnorm uses quantile regression to estimate the dependence of read counts on sequencing depth for every gene. Genes with similar dependence are then grouped, and a second quantile regression is used to estimate scale factors within each group. Within group adjustment for sequencing depth is then performed using the estimated scale factors to provide normalized estimates of expression. Although SCnorm does not require spike-ins, performance may be improved if good spike-ins are available.
For each gene, median quantile regression was used to estimate the countdepth relationship before normalization and after normalization via MR for the H1 bulk RNA-seq data set (panels (a) – (d)) and the DEC scRNA-seq data set (panels (e)-(h)). Panel (a) shows log-expression vs. log-depth and estimated regression fits for three genes having low, moderate, and high expression defined as median expression among nonzero un-normalized measurements in the 10th-20th quantile, 40th-50th quantile, and 80th-90th quantile, respectively. Panel (b) shows densities of slopes within each of ten equally sized gene groups where a gene’s group membership is determined by its median expression among non-zero un-normalized measurements. Panels (c) and (d) show the same data as panels (a) and (b), respectively, but here the data are normalized via MR. Panels (e)-(h) are structurally identical to (a)-(d) for the DEC scRNA-seq data set. Qualitatively similar results are observed if slopes are calculated via generalized linear models