Despite its popularity, characterization of subpopulations with transcript abundance is subject to significant amount of noise. Researchers from the University of Hawaii Cancer Center propose to use effective and expressed nucleotide variations (eeSNVs) from scRNA-seq as alternative features for tumor subpopulation identification. They developed a linear modeling framework SSrGE to link eeSNVs associated with gene expression. In all the cancer datasets tested, eeSNVs achieve better accuracies and visualization than gene expression for identifying subpopulations. Previously validated cancer relevant genes are also highly ranked, confirming the significance of the method. In summary, SNV features from scRNA-seq data have merits for both subpopulation identification and linkage of genotype-phenotype relationship.
Comparison of clustering visualization using eeSNV and gene expression (GE) features
(A) Bipartite graphs using eeSNVs and cell representations. (B) Principle Component Analysis (PCA) results using gene expression. (C) PCA results using eeSNVs. (D) SIMILR results using gene expression.
Availability – The method is available at https://github.com/lanagarmire/SSrGE