Despite its popularity, characterization of subpopulations with transcript abundance is subject to significant amount of noise. Researchers at the University of Hawaii Cancer Center propose to use effective and expressed nucleotide variations (eeSNVs) from scRNA-seq as alternative features for tumor subpopulation identification. They developed a linear modeling framework SSrGE to link eeSNVs associated with gene expression. In all the cancer datasets tested, eeSNVs achieve better accuracies and more complexity than gene expression for identifying subpopulations. Previously validated cancer relevant genes are also highly ranked, confirming the significance of the method. Moreover, SSrGE is capable of analyzing coupled DNA-seq and RNA-seq data from the same single cells, demonstrating its power over the cutting-edge single-cell genomics techniques. In summary, SNV features from scRNA-seq data have merits for both subpopulation identification and linkage of genotype-phenotype relationship.
Comparison of clustering visualization using eeSNV and gene expression (GE) features
(A) Bipartite graphs using eeSNVs and cell representations. (B) Principle Component Analysis (PCA) results using gene expression. (C) PCA results using eeSNVs. (D) SIMILR results using gene expression.
Availability – SSrGE method is available at: https://github.com/lanagarmire/SSrGE