Single-cell RNA sequencing (scRNA-seq) technology provides a powerful tool for investigating cell heterogeneity and cell subpopulations by allowing the quantification of gene expression at single cell level. However, scRNA-seq data analysis remains challenging because of various technical noises such as dropout events (i.e., excessive zero counts in the expression matrix).
By taking consideration of the association among cells and genes, Hunan University researchers propose a novel collaborative matrix factorization-based method called CMF-Impute to impute the dropout entries of a given scRNA-seq expression matrix. They test CMF-Impute and compare it with the other five state-of-the-art methods on six popular real scRNA-seq datasets of various sizes and three simulated datasets. For simulated datasets, CMF-Impute outperforms other methods in imputing the closest dropouts to the original expression values as evaluated by both the sum of squared error (SSE) and Pearson correlation coefficient (PCC). For real datasets, CMF-Impute achieves the most accurate cell classification results in spite of the choice of different clustering methods like SC3 or t-SNE followed by K-means as evaluated by both adjusted rand index (ARI) and normalized mutual information (NMI). Finally, the researchers demonstrate that CMF-Impute is powerful in reconstructing cell-to-cell and gene-to-gene correlation, and in inferring cell lineage trajectories.
A workflow of the CMF-Impute algorithm
(1) data cleansing, normalization, and log transformation. (2) Calculating the similarity matrix among cells and among genes. (3) Getting the interpolation expression matrix based on CMF-Impute algorithm.
Availability – CMF-Impute is written as a Matlab package which is available at – https://github.com/xujunlin123/CMFImpute.git