Single-cell RNA sequencing (scRNA-seq) is revolutionizing the study of complex and dynamic cellular mechanisms. However, cell-type annotation remains a main challenge as it largely relies on a priori knowledge and manual curation, which is cumbersome and subjective. The increasing number of scRNA-seq datasets, as well as numerous published genetic studies, motivated us to build a comprehensive human cell type reference atlas.
A team led by researchers at the University of Texas Health Science Center at Houston has developed decoding Cell type Specificity (deCS), an automatic cell type annotation method augmented by a comprehensive collection of human cell type expression profiles and marker genes. The researchers used deCS to annotate scRNA-seq data from various tissue types and systematically evaluated the annotation accuracy under different conditions, including reference panels, sequencing depth, and feature selection strategies. The results demonstrated that expanding the references is critical for improving annotation accuracy. Compared to many existing state-of-the-art annotation tools, deCS significantly reduced computation time and increased accuracy. deCS can be integrated into the standard scRNA-seq analytical pipeline to enhance cell type annotation. Finally, the researchers demonstrated the broad utility of deCS to identify trait–cell type associations in 51 human complex traits, providing deep insights into the cellular mechanisms underlying disease pathogenesis.
Overview of deCS flowchart
For each cell type, we compute t-statistics and z-scores for each gene in the bulk RNA-seq and scRNA-seq derived references, respectively. Then, we define genes with the highest t-statistics or z-scores (top 5%) as CTGenes. We further integrate cell signature gene sets from CellMatch database. Depending on the type of “query data”, when the query input is a gene expression profile, deCS calculates PCC or SCC between query scaled expression profiles and t-statistics (or z-scores) of each cell type in the reference, then assigns the label with the highest score to the query profile. When the query input is a list of genes, deCS is analogous to existing tools for identifying candidate genes that are overrepresented in specific GO terms or KEGG pathways. Finally, the top enriched cell type is annotated to query data. scRNA-seq, single-cell RNA sequencing; CTGene, cell type-specific gene; PCC, Pearson correlation coefficient; SCC, Spearman’s correlation coefficient; GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; QC, quality control.
Availability – All documents for deCS, including source code, user manual, demo data, and tutorials, are freely available at https://github.com/bsml320/deCS.