Large-scale integration of single-cell transcriptomic data as a tool for biological discovery

Skeletal muscle repair is driven by the coordinated self-renewal and fusion of myogenic stem and progenitor cells. Single-cell gene expression analyses of myogenesis have been hampered by the poor sampling of rare and transient cell states that are critical for muscle repair, and do not inform the spatial context that is important for myogenic differentiation. Cornell University researchers demonstrate how large-scale integration of single-cell and spatial transcriptomic data can overcome these limitations. They created a single-cell transcriptomic dataset of mouse skeletal muscle by integration, consensus annotation, and analysis of 23 newly collected scRNAseq datasets and 88 publicly available single-cell (scRNAseq) and single-nucleus (snRNAseq) RNA-sequencing datasets. The resulting dataset includes more than 365,000 cells and spans a wide range of ages, injury, and repair conditions. Together, these data enabled identification of the predominant cell types in skeletal muscle, and resolved cell subtypes, including endothelial subtypes distinguished by vessel-type of origin, fibro-adipogenic progenitors defined by functional roles, and many distinct immune populations. The representation of different experimental conditions and the depth of transcriptome coverage enabled robust profiling of sparsely expressed genes.

The researchers built a densely sampled transcriptomic model of myogenesis, from stem cell quiescence to myofiber maturation, and identified rare, transitional states of progenitor commitment and fusion that are poorly represented in individual datasets. They performed spatial RNA sequencing of mouse muscle at three time points after injury and used the integrated dataset as a reference to achieve a high-resolution, local deconvolution of cell subtypes. They also used the integrated dataset to explore ligand-receptor co-expression patterns and identify dynamic cell-cell interactions in muscle injury response. The researchers also provide a public web tool to enable interactive exploration and visualization of the data. This work supports the utility of large-scale integration of single-cell transcriptomic data as a tool for biological discovery.

Large-scale integration of 111 single-cell and single-nucleus RNAseq
samples reveals cell subtypes in skeletal muscle

Fig. 1

a Workflow used for preparation, integration, and analysis of sc/snRNAseq compendium (see “Methods”). b Overview of experimental and technical variables across compendium. The percentages shown are calculated with respect to cell number after quality control. Ages in months (mo). Injury by cardiotoxin (CTX) or notexin (NTX). Time-points in days post-injury (dpi). c UMAP representation of the merged datasets after alignment, ambient RNA removal, quality control filtering and doublet removal, but before batch-correction, colored by the dataset source. d UMAP representation of integrated compendium after batch-correction with Harmony. Cells are colored by cell type, identified after Harmony integration. e Differential detection of gene biotype sets between single-cell and single-nucleus datasets, including all protein-coding genes, long noncoding RNAs (lncRNAs), transcription factors, cell surface proteins, ribosomal protein subunits, mitochondrial genes, and “core” dissociation-associated stress factors.

Availability – The full integrated dataset with visualization tools is available at

McKellar DW, Walter LD, Song LT et al. (2021) Large-scale integration of single-cell transcriptomic data captures transitional progenitor states in mouse skeletal muscle regeneration. Commun Biol 4, 1280. [article]

Leave a Reply

Your email address will not be published. Required fields are marked *


Time limit is exhausted. Please reload CAPTCHA.