Large-scale studies of single-cell sequencing and biological experiments have successfully revealed expression patterns that distinguish different cell types in tissues, emphasizing the importance of studying cellular heterogeneity and accurately annotating cell types. Analysis of gene expression profiles in these experiments provides two essential types of data for cell type annotation: annotated references and canonical markers. Researchers at the Zhejiang University School of Medicine have developed the first comprehensive database of single-cell transcriptomic annotation resource (CellSTAR). It is unique in (a) offering the comprehensive expertly annotated reference data for annotating hundreds of cell types for the first time and (b) enabling the collective consideration of reference data and marker genes by incorporating tens of thousands of markers. Given its unique features, CellSTAR is expected to attract broad research interests from the technological innovations in single-cell transcriptomics, the studies of cellular heterogeneity & dynamics, and so on.
Schematic illustrations of the general workflow of cell type annotation and the features of annotation-related prior data (annotated reference datasets and marker genes)
provided by CellSTAR
(A) Acquisition of unannotated data: the acquisition of large-scale unannotated datasets from single-cell sequencing studies necessitates accurate cell type annotation. (B) Cell type annotation: unlike the strategy that relies on traditional information of canonical marker genes that are specifically expressed in known cell types, the reference-based annotation strategy utilizes comprehensive gene expression profiles of expertly annotated reference datasets. Due to this feature, it has demonstrated superiority in capturing expression variability and coverage, exhibiting efficiency and reproducibility, and achieving high resolution. Furthermore, the accuracy, reliability and consistency of both annotation strategies heavily depend on the availability, quality and applicability of annotation data, which commonly requires a comprehensive database that integrates curated reference and marker data to achieve abundant availability, high quality, and complementary applicability. (C) Analysis of annotated data: by enabling collective considerations of both types of data, CellSTAR is expected to facilitate accurate and robust identification of cell identities and various downstream analyses, such as studies of cellular heterogeneity and dynamics, disease research, drug discovery.
Availability – CellSTAR is now publicly accessible without any login requirement at: https://idrblab.org/cellstar.