Elucidation of cell subpopulations at high resolution is a key and challenging goal of single-cell ribonucleic acid (RNA) sequencing (scRNA-seq) data analysis. Although unsupervised clustering methods have been proposed for de novo identification of cell populations, their performance and robustness suffer from the high variability, low capture efficiency and high dropout rates which are characteristic of scRNA-seq experiments.
Researchers from Rowan University have developed a novel unsupervised method for Single-cell Clustering by Enhancing Network Affinity (SCENA), which mainly employed three strategies: selecting multiple gene sets, enhancing local affinity among cells and clustering of consensus matrices. Large-scale validations on 13 real scRNA-seq datasets show that SCENA has high accuracy in detecting cell populations and is robust against dropout noise. When the researchers applied SCENA to large-scale scRNA-seq data of mouse brain cells, known cell types were successfully detected, and novel cell types of interneurons were identified with differential expression of gamma-aminobutyric acid receptor subunits and transporters. SCENA is equipped with CPU + GPU (Central Processing Units + Graphics Processing Units) heterogeneous parallel computing to achieve high running speed. The high performance and running speed of SCENA combine into a new and efficient platform for biological discoveries in clustering analysis of large and diverse scRNA-seq datasets.
Illustrative example of SCENA steps
(A) Gene expression levels will be normalized and sorted by variances in descending order. (B) Multiple top feature gene sets are selected. (C) For each feature gene set, the cell–cell similarity matrix is constructed. (D) For each cell–cell similarity matrix, its local affinity is enhanced. (E) For each cell–cell similarity matrix, the number of clusters is estimated and clusters are detected by a spectral clustering method. (F) Consensus clustering matrix is calculated by merging the clustering results of multiple feature gene sets. (G) Cell populations are annotated for different clusters.