Normalization of single-cell RNA sequencing data is necessary to eliminate cell-specific biases prior to downstream analyses. However, this is not straightforward for noisy single-cell data where many counts are zero.
A team led by researchers at the University of Cambridge have developed a novel approach where expression values are summed across pools of cells, and the summed values are used for normalization. Pool-based size factors are then deconvolved to yield cell-based factors. Their deconvolution approach outperforms existing methods for accurate normalization of cell-specific biases in simulated data. Similar behavior is observed in real data, where deconvolution improves the relevance of results of downstream analyses.
Schematic of the deconvolution method
All cells in the data set are averaged to make a reference pseudo-cell. Expression values for cells in pool A are summed together and normalized against the reference to yield a pool-based size factor θ A . This is equal to the sum of the cell-based factors θ j for cells j=1–4 and can be used to formulate a linear equation. (For simplicity, the t j term is assumed to be unity here.) Repeating this for multiple pools (e.g., pool B) leads to the construction of a linear system that can be solved to estimate θ j for each cell j
Availability – The deconvolution approach has been implemented as an R function, with C++ extensions for fast construction of the linear system. It is publicly available as the computeSumFactors function in the scran package on Bioconductor (http://bioconductor.org/packages/scran) under the GNU General Public Licence v3.