One primary reason that makes single-cell RNA-seq analysis challenging is dropouts, where the data only captures a small fraction of the transcriptome of each cell. Almost all computational algorithms developed for single-cell RNA-seq adopted gene selection, dimension reduction or imputation to address the dropouts. Here, an opposite view is explored. Instead of treating dropouts as a problem to be fixed, Georgia Institute of Technology and Emory University researchers embrace it as a useful signal. They represent the dropout pattern by binarizing single-cell RNA-seq count data, and present a co-occurrence clustering algorithm to cluster cells based on the dropout pattern. The researchers demonstrate in multiple published datasets that the binary dropout pattern is as informative as the quantitative expression of highly variable genes for the purpose of identifying cell types. They expect that recognizing the utility of dropouts provides an alternative direction for developing computational algorithms for single-cell RNA-seq analysis.
Co-occurrence clustering applied to dropout pattern in PBMC data
a–g Gene pathways and cell clusters identified in each iteration of the co-occurrence clustering algorithm. h Comparison between co-occurrence clusters and Seurat clusters on this dataset. i Pathway activities and enriched GO terms. Enrichment is evaluated by one-sided hypergeometric test on the overlap between identified pathways and GO gene sets provided by MSigDB. The reported p-values are unadjusted. j, k Random Forest classification and 5-fold cross-validation of the co-occurrence clusters based on pathway activities and highly variable genes.
Availability – Source code for the co-occurrence clustering algorithm implementation is available at https://github.com/pqiu/cooccurrence_clustering.