Pooling across cells to normalize single-cell RNA sequencing data with many zero counts

Normalization of single-cell RNA sequencing data is necessary to eliminate cell-specific biases prior to downstream analyses. However, this is not straightforward for noisy single-cell data where many counts are zero.

A team led by researchers at the University of Cambridge have developed a novel approach where expression values are summed across pools of cells, and the summed values are used for normalization. Pool-based size factors are then deconvolved to yield cell-based factors. Their deconvolution approach outperforms existing methods for accurate normalization of cell-specific biases in simulated data. Similar behavior is observed in real data, where deconvolution improves the relevance of results of downstream analyses.

Schematic of the deconvolution method

rna-seq

All cells in the data set are averaged to make a reference pseudo-cell. Expression values for cells in pool A are summed together and normalized against the reference to yield a pool-based size factor θ A . This is equal to the sum of the cell-based factors θ j for cells j=1–4 and can be used to formulate a linear equation. (For simplicity, the t j term is assumed to be unity here.) Repeating this for multiple pools (e.g., pool B) leads to the construction of a linear system that can be solved to estimate θ j for each cell j

Availability – The deconvolution approach has been implemented as an R function, with C++ extensions for fast construction of the linear system. It is publicly available as the computeSumFactors function in the scran package on Bioconductor (http://​bioconductor.​org/​packages/​scran) under the GNU General Public Licence v3.

L Lun AT, Bach K, Marioni JC. (2016) Pooling across cells to normalize single-cell RNA sequencing data with many zero counts. Genome Biol 17(1):75. [article]

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Time limit is exhausted. Please reload CAPTCHA.