scAnnotate – an automated cell type annotation tool for single-cell RNA-sequencing data

Single-cell RNA-sequencing (scRNA-seq) technology enables researchers to investigate a genome at the cellular level with unprecedented resolution. An organism consists of a heterogeneous collection of cell types, each of which plays a distinct role in various biological processes. Hence, the first step of scRNA-seq data analysis often is to distinguish cell types so that they can be investigated separately. Researchers have recently developed several automated cell type annotation tools based on supervised machine learning algorithms, requiring neither biological knowledge nor subjective human decisions. Dropout is a crucial characteristic of scRNA-seq data widely used in differential expression analysis. However, existing supervised learning methods for cell annotation do not use the dropout information explicitly. This motivated University of Victoria researchers to build a novel cell annotation tool that fully utilizes this information.

The researchers have developed scAnnotate, an automated cell annotation tool based on supervised machine learning algorithms. They used a marginal mixture model to describe both the dropout proportion and the non-dropout expression level distribution of a gene. They developed a marginal model based ensemble learning approach to avoid having to specify and estimate a high-dimensional joint distribution for all genes. First, the researchers built a `weak’ classifier using the mixture model for each gene. Then, they combined `weak’ classifiers of all genes into a single `strong’ classifier to annotate cells. Using 11 real scRNA-seq data, the reseachers demonstrate that scAnnotate is competitive against 9 other methods, and that it accurately annotates cells when training and test data are (1) similar, (2) cross-platform, and (3) cross-species.


Workflow of scAnnotate on large sample datasets (with a mean of 600 observations per cell type and no less than 20 cells for any given type). The gray vertical dashed line separates training data (left) and test data (right) information.

Ji X, Tsao D, Bai K, Tsao M, Zhang X. (2022) scAnnotate: an automated cell type annotation tool for single-cell RNA-sequencing data. bioRXiv [online preprint]. [abstract]

Leave a Reply

Your email address will not be published. Required fields are marked *


Time limit is exhausted. Please reload CAPTCHA.