Single-cell RNA sequencing (scRNA-seq) technology has been a significant direction for single-cell research due to its high accuracy and specificity, as it enables unbiased high-throughput studies with minimal sample sizes. The continuous improvement of scRNA-seq technology has promoted parallel research on single-cell multi-omics. Instead of sequencing bulk cells, analyzing single cells inspires greater discovery power for detecting novel genes without prior knowledge of sequence information and with greater sensitivity when quantifying rare variants and transcripts. However, current analyses of scRNA-seq data are usually carried out with unsupervised methods, which cannot take advantage of the prior distribution and structural features of the data.
To solve this problem, researchers at the Wuhan Institute of Technology propose the SCAFG (Classifying Single Cell Types Based on an Adaptive Threshold Fusion Graph Convolution Network), a semi-supervised single-cell classification model that adaptively fuses cell-to-cell correlation matrices under various thresholds according to the distribution of cells. The researchers tested the performance of the SCAFG in identifying cell types on diverse real scRNA-seq data; then, they compared the SCAFG with other commonly used semi-supervised algorithms, and it was shown that the SCAFG can classify single-cell data with a higher accuracy.
Overview of the SCAFG
(a) Data preprocessing, including normalization and similarity transformation of the gene expression matrix. (b) Dividing the similarity matrix into nine incidence matrices through threshold segmentation, then converting the similarities between the incidence matrices into a similarity distance matrix. (c) Finding the row and column index of the largest element value in the similarity distance matrix. (d) Fusing the incidence matrix and saving the consensus matrix as a graph. (e) The graph is the input of the GCN, and the output is a probability matrix. The column index corresponding to the maximum probability value of each row in the probability matrix is the category to which the cell belongs.