Single-cell RNA sequencing has spurred the development of computational methods that enable researchers to classify cell types, delineate developmental trajectories, and measure molecular responses to external perturbations. Many of these technologies rely on their ability to detect genes whose cell-to-cell variations arise from the biological processes of interest rather than transcriptional or technical noise. However, for datasets in which the biologically relevant differences between cells are subtle, identifying these genes is challenging.
Stanford University researchers present the self-assembling manifold (SAM) algorithm, an iterative soft feature selection strategy to quantify gene relevance and improve dimensionality reduction. They demonstrate its advantages over other state-of-the-art methods with experimental validation in identifying novel stem cell populations of Schistosoma mansoni, a prevalent parasite that infects hundreds of millions of people. Extending their analysis to a total of 56 datasets, the researchers show that SAM is generalizable and consistently outperforms other methods in a variety of biological and quantitative benchmarks.
The SAM algorithm
(a) SAM starts with a randomly initialized kNN matrix and iterates to refine the kNN matrix and weight vector until convergence. (b) Normalized root mean square error (RMSE) between adjacent iterations within a single run (top) and between multiple runs at the same iteration (bottom) to show that SAM converges to a universal, stable solution regardless of initial conditions. (c) Graph structures and weights converging to the final output over the course of 15 iterations (i denotes iteration number). Top: nodes are cells and edges connect neighbors. Nodes are color-coded according to the final clusters. Bottom: weights are sorted according to the final gene rankings.
Availability – https://github.com/atarashansky/self-assembling-manifold