Large collections of annotated single-cell RNA sequencing (scRNA-seq) experiments are being generated across different organs, conditions and organisms on different platforms. The diversity, volume and complexity of this aggregated data requires new analysis techniques to extract actionable knowledge. Fundamental to most analysis are key abilities such as: identification of similar cells across different experiments and transferring annotations from an annotated dataset to an unannotated one. There have been many strategies explored in achieving these goals, and they focuses primarily on aligning and re-clustering datasets of interest.
Researchers from the Institute of Molecular and Cell Biology, A*STAR were interested in exploring the applicability of deep metric learning methods as a form of distance function to capture similarity between cells and facilitate the transfer of cell type annotation for similar cells across different experiments. Toward this aim, the researchers developed MapCell, a few-shot training approach using Siamese Neural Networks (SNNs) to learn a generalizable distance metric that can differentiate between single cell types. Requiring only a small training set, they demonstrated that SNN derived distance metric can perform accurate transfer of annotation across different scRNA-seq platforms, batches, species and also aid in flagging novel cell types.
Architecture of MapCell Siamese Neural Network (SNN)
(A) (Top) SNN architecture (Bottom) Low-dimensional representation of embedding space. (B) SNN inference: Each cell in the sample set is compared using the SNN metric to a set of reference cells used in the learning stage. The assignment is made to the closest reference type. Cells that do not meet the threshold are flagged as novel cell types. These novel types can be reincorporated into the training set to generate a new SNN or included in the reference set without training.