Single-cell RNA-Sequencing (scRNA-Seq) is a fast-evolving technology that enables the understanding of biological processes at an unprecedentedly high resolution. However, well-suited bioinformatics tools to analyze the data generated from this new technology are still lacking.
Here researchers from the University of Hawaii Cancer Center investigate the performance of non-negative matrix factorization (NMF) method to analyze a wide variety of scRNA-Seq datasets, ranging from mouse hematopoietic stem cells to human glioblastoma data. In comparison to other unsupervised clustering methods including K-means and hierarchical clustering, NMF has higher accuracy in separating similar groups in various datasets. The researchers ranked genes by their importance scores (D-scores) in separating these groups, and discovered that NMF uniquely identifies genes expressed at intermediate levels as top-ranked genes. Finally, they show that in conjugation with the modularity detection method FEM, NMF reveals meaningful protein-protein interaction modules.
The workflow of NMFEM
The input can be either FASTQ files or a raw counts table. If FASTQ files are used, they are aligned using TopHat and counted using FeatureCounts (steps shown in brackets). The input or calculated raw counts table are filtered by samples and genes, converted into FPKMs using gene lengths, and normalized by samples. We then run NMF method on them to detect groups of cells, and find the feature genes separating the detected groups. Finally, we feed the feature genes as seed genes in FEM, and generate PPI gene modules that contain highly differentially expressed genes.
Availability – The NMF based subpopulation detection package is available at: https://github.com/lanagarmire/NMFEM