Single-cell RNA sequencing has become a valuable tool for investigating cell types in complex tissues, where clustering of cells enables the identification and comparison of cell populations. Although many studies have sought to develop and compare different clustering approaches, a deeper investigation into the properties of the resulting populations is lacking. Specifically, the presence of misclassified cells can influence downstream analyses, highlighting the need to assess subpopulation purity and to detect such cells.
Chalmers University of Technology researchers have developed DSAVE (Down-SAmpling based Variation Estimation), a method to evaluate the purity of single-cell transcriptome clusters and to identify misclassified cells. The method utilizes down-sampling to eliminate differences in sampling noise and uses a log-likelihood based metric to help identify misclassified cells. In addition, DSAVE estimates the number of cells needed in a population to achieve a stable average gene expression profile within a certain gene expression range. The developers show that DSAVE can be used to find potentially misclassified cells that are not detectable by similar tools and reveal the cause of their divergence from the other cells, such as differing cell state or cell type. With the growing use of single-cell RNA-seq, the developers foresee that DSAVE will be an increasingly useful tool for comparing and purifying subpopulations in single-cell RNA-Seq datasets.
Typical use case for the DSAVE BTM variation score and DSAVE cell divergence
Ovals represent data while rounded rectangles represent data processing. The DSAVE BTM variation score and cell divergence are both applied to cell populations defined by clustering, using the original UMI count data in combination with cell clustering assignments. DSAVE allows for an iterative approach where the user can remove/reassign cells, experiment with clustering parameters, and assess the outcome, both in terms of total cell variation within the cluster (the BTM variation score) and detected misclassified cells. When the results are satisfactory the user can finalize the curation and proceed to further data analysis. The BTM variation score calculation requires a DSAVE template, which is explained further below in the methods section.