A comprehensive comparison of supervised and unsupervised methods for cell type identification in single-cell RNA-seq

The cell type identification is among the most important tasks in single-cell RNA-sequencing (scRNA-seq) analysis. Many in silico methods have been developed and can be roughly categorized as either supervised or unsupervised. In this study, A team led by researchers at Emory University investigated the performances of 8 supervised and 10 unsupervised cell type identification methods using 14 public scRNA-seq datasets of different tissues, sequencing protocols and species. The researchers investigated the impacts of a number of factors, including total amount of cells, number of cell types, sequencing depth, batch effects, reference bias, cell population imbalance, unknown/novel cell type, and computational efficiency and scalability. Instead of merely comparing individual methods, they focused on factors’ impacts on the general category of supervised and unsupervised methods.

The workflow of the scRNAIdent  pipeline

workflow image

scRNAIdent provides a modularized R pipeline tool for automating the evaluation and comparison of cell typing methods in scRNA-seq analysis.

The researchers found that in most scenarios, the supervised methods outperformed the unsupervised methods, except for the identification of unknown cell types. This is particularly true when the supervised methods use a reference dataset with high informational sufficiency, low complexity and high similarity to the query dataset. However, such outperformance could be undermined by some undesired dataset properties investigated in this study, which lead to uninformative and biased reference datasets. In these scenarios, unsupervised methods could be comparable to supervised methods. This study not only explained the cell typing methods’ behaviors under different experimental settings but also provided a general guideline for the choice of method according to the scientific goal and dataset properties. Finally, the evaluation workflow is implemented as a modularized R pipeline that allows future evaluation of new methods.

Availability: All the source codes are available at https://github.com/xsun28/scRNAIdent.

Sun X, Lin X, Li Z, Wu H. (2022) A comprehensive comparison of supervised and unsupervised methods for cell type identification in single-cell RNA-seq. Brief Bioinform [Epub ahead of print]. [abstract]

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Time limit is exhausted. Please reload CAPTCHA.