Integrative analysis of large-scale single-cell RNA sequencing (scRNA-seq) datasets can aggregate complementary biological information from different datasets. However, most existing methods fail to efficiently integrate multiple large-scale scRNA-seq datasets. Researchers at the University Health Network, Toronto have developed OCAT, One Cell At a Time, a machine learning method that sparsely encodes single-cell gene expression to integrate data from multiple sources without highly variable gene selection or explicit batch effect correction. The researchers demonstrate that OCAT efficiently integrates multiple scRNA-seq datasets and achieves the state-of-the-art performance in cell type clustering, especially in challenging scenarios of non-overlapping cell types. In addition, OCAT can efficaciously facilitate a variety of downstream analyses.
Schematic workflow of OCAT
When integrating multiple scRNA-seq datasets, OCAT first identifies “ghost” cells, centers of small cell neighborhoods, in each dataset. OCAT next constructs a bipartite graph connecting each cell to its most similar “ghost” cells. The edge weights connecting each cell’s closest “ghost” cells are treated as its OCAT sparse encoding. The OCAT sparse encoding can effectively correct the batch effect and facilitate various downstream analysis tasks, such as cell clustering, differential gene expression analysis, trajectory inference, and cell type inference