Identification of cell populations often relies on manual annotation of cell clusters using established marker genes. However, the selection of marker genes is a time-consuming process that may lead to sub-optimal annotations as the markers must be informative of both the individual cell clusters and various cell types present in the sample.
Researchers from the University of Helsinki have developed a computational platform, ScType, which enables a fully-automated and ultra-fast cell-type identification based solely on a given scRNA-seq data, along with a comprehensive cell marker database as background information. Using six scRNA-seq datasets from various human and mouse tissues, the researchers show how ScType provides unbiased and accurate cell type annotations by guaranteeing the specificity of positive and negative marker genes across cell clusters and cell types. They also demonstrate how ScType distinguishes between healthy and malignant cell populations, based on single-cell calling of single-nucleotide variants, making it a versatile tool for anticancer applications.
A schematic view of cell-type annotation using ScType
a ScType requires only the raw or pre-processed single-cell transcriptomics dataset(s) as input. ScType implements options for additional quality control and normalization steps, where needed, followed by unsupervised clustering of cells based on scRNA-seq profiles. The results here are based on the Louvain clustering; however, also SC3, DBSCAN, GiniClust and k-means clustering options are available in ScType (see Methods). In the next step, ScType performs a fully-automated cell-type annotation using an in-built comprehensive marker database. Finally, ScType implements novel options for somatic single-cell SNV calling to distinguish between healthy and malignant cell populations. b, c ScType specificity score guarantees that the marker genes show specificity both across clusters and cell types for accurate unsupervised cell-type annotation with high cell subpopulation selectivity. d UMAP example of automated cell subtype identification by ScType in the liver atlas dataset, where it automatically labelled the same cell-types as assigned manually in the original study. e Based on the information that plasma cells do not express common B-cell markers, such as CD19 and CD20, but instead express CD138, ScType enhanced the resolution of cell-type annotations of two cell clusters, which were jointly annotated as B-cells in the original study, by segregating them into immature B-cell and plasma (B) cell types (lower UMAP plot of panel (d).
Availability – The widely applicable method is deployed both as an interactive web-tool (https://sctype.app), and as an open-source R-package.