Single-cell RNA sequencing technology is one of the most cost-effective ways to uncover transcriptomic heterogeneity. With the rapid rise of this technology, enormous amounts of scRNA-seq data have been produced. Due to the high dimensionality, noise, sparsity and missing features of the available scRNA-seq data, accurately clustering the scRNA-seq data for downstream analysis is a significant challenge. Many computational methods have been designed to address this issue; nevertheless, the efficacy of the available methods is still inadequate. In addition, most similarity-based methods require a number of clusters as input, which is difficult to achieve in real applications.
Jiaotong University researchers developed a novel computational method for clustering scRNA-seq data by considering both global and local information, named GCFG. This method characterizes the global properties of data by applying concept factorization, and the regularized Gaussian graphical model is utilized to evaluate the local embedding relationship of data. To learn the cell-cell similarity matrix, the researchers integrated the two components, and an iterative optimization algorithm was developed. The categorization of single cells is obtained by applying Louvain, a modularity-based community discovery algorithm, to the similarity matrix. The behavior of the GCFG approach is assessed on 14 real scRNA-seq datasets in terms of ACC and ARI, and comparison results with 17 other competitive methods suggest that GCFG is effective and robust.
Availability – The source codes of GCFG, as well as a portion of the datasets, have been archived in GitHub repository: https://github.com/Yaxin-Xu/GCFG).