Single-cell RNA-seq is a promising technology with broad applications and discerning biological noise from technical noise is critical for correctly interpreting the data (Jaitin, et al., 2014). Recently, statistical methods are developed to model the technical noise from spike-in ERCC molecules, whose concentrations are presumably same across the samples, and then identify differentially expressed genes, whose variations across samples are significantly larger than technical noise (Brennecke, et al., 2013). A limit for such an approach is that the true gene expression level is not explicitly calculated, which is needed for many analyses based on quantification of transcriptions.
Here, researchers from UCSD propose a novel strategy to normalize and de-noise single cell RNA-seq data. This method calculates RNA concentrations from the sequencing reads, which is opposite to the other published methods that model sequencing reads from RNA concentrations; it is much simpler than the existing methods but importantly it allows to remove technical noise and explicitly compute gene expression. Specifically, they fit a gamma regression model (GRM) between the sequencing reads (RPKM, FPKM or TPM) and the concentration of spike-in ERCC molecules. The trained model is then used to estimate the de-noised molecular concentration of the genes from the reads. GRM shows great power of reducing technical noise and superior performance compared to several popular normalization methods such as FPKM(Tu, et al., 2012), TMM(Robinson and Oshlack, 2010) and FQ(Bullard, et al., 2010) in analyzing single cell RNA-seq data.
Availability – The software is implemented by R and the download version is available at http://wanglab.ucsd.edu/star/GRM