Single cell RNA sequencing (scRNA-seq) is a powerful tool in detailing the cellular landscape within complex tissues. Large-scale single cell transcriptomics provide both opportunities and challenges for identifying rare cells playing crucial roles in development and disease. Researchers from Shanghai Jiao Tong University have developed GapClust, a light-weight algorithm to detect rare cell types from ultra-large scRNA-seq datasets with state-of-the-art speed and memory efficiency. Benchmarking on diverse experimental datasets demonstrates the superior performance of GapClust compared to other recently proposed methods. When applying this algorithm to an intestine and 68 k PBMC datasets, GapClust identifies the tuft cells and a previously unrecognised subtype of monocyte, respectively.
Overview of GapClust
The first step is obtaining K nearest neighbours for all cells. For each cell m, ∆D<m, k> and ∆∆D<m, k> can be obtained according to the formula. Then the skewness of adjusted ∆∆D<m, k> values is calculated. Candidate k can be identified if skewness > 2. In the last step, for each candidate k, the cell with largest ∆∆D<m, k> value among N cells and its k-1 nearest neighbours are identified and subject to filtering steps to determine the final rare cell types.