Single cell RNA sequencing (scRNAseq) technique is becoming increasingly popular for unbiased and high-resolutional transcriptome analysis of heterogeneous cell populations. Despite its many advantages, scRNAseq, like any other genomic sequencing technique, is susceptible to the influence of confounding effects. Controlling for confounding effects in scRNAseq data is thus a crucial step for proper data normalization and accurate downstream analysis. Several recent methodological studies have demonstrated the use of control genes for controlling for confounding effects in scRNAseq studies; the control genes are used to infer the confounding effects, which are then used to normalize target genes of primary interest. However, these methods can be suboptimal as they ignore the rich information contained in the target genes.
Now, researchers from the University of North Carolina and the University of Michigan develop an alternative statistical method, which they refer to as scPLS, for more accurate inference of confounding effects. Their method is based on partial least squares and models control and target genes jointly to better infer and control for confounding effects. To accompany the method, they developed a novel expectation maximization algorithm for scalable inference. The algorithm is an order of magnitude faster than standard ones, making scPLS applicable to hundreds of cells and hundreds of thousands of genes.
Illustration of scPLS. We model the expression level of genes in the control set (X) and genes in the target set (Y) jointly. Both control and target genes are affected by the common confounding factors (Z) with effects Ax and Ay in the two gene sets, respectively. The target genes are also influenced by biological factors (U) with effects Au. The biological factors represent intermediate factors that coordinately regulate a set of genes, and are introduced to better capture the complex variance structure in the target genes. Ex and Ey represent residual errors. scPLS aims to remove the confounding effects ZAy in the target genes.
Availability – The method is implemented as a part of the Citrus project and is freely available at: http://chenmengjie.github.io/Citrus/.