Dictionary learning for integrative, multimodal, and scalable single-cell analysis

Mapping single-cell sequencing profiles to comprehensive reference datasets represents a powerful alternative to unsupervised analysis. Reference datasets, however, are predominantly constructed from single-cell RNA-seq data, and cannot be used to annotate datasets that do not measure gene expression. Researchers at New York University have developed ‘bridge integration’, a method to harmonize singlecell datasets across modalities by leveraging a multi-omic dataset as a molecular bridge. Each cell in the multi-omic dataset comprises an element in a ‘dictionary’, which can be used to reconstruct unimodal datasets and transform them into a shared space. The researchers demonstrate that their procedure can accurately harmonize transcriptomic data with independent single cell measurements of chromatin accessibility, histone modifications, DNA methylation, and protein levels. Moreover, they demonstrate how dictionary learning can be combined with sketching techniques to substantially improve computational scalability, and harmonize 8.6 million human immune cell profiles from sequencing and mass cytometry experiments. This approach aims to broaden the utility of single-cell reference datasets and facilitate comparisons across diverse molecular modalities.

Integrating across modalities with molecular bridges

(a) Broad schematic of bridge integration workflow. Two datasets where different modalities are measured (e.g. scRNA-seq and scATAC-seq), can be harmonized via a third dataset where both modalities are simultaneously measured (e.g. 10x multiome). We demonstrate bridge integration using a variety of multi-omic technologies that can be used as bridges, including 10x multiome, Paired-Tag, snmC2T, and CITE-seq, each of which facilitates integration with a different molecular modality. Middle box lists alternative multi-omic technologies that can be used to generate bridge datasets. (b) Mathematical schematic of each of the steps in the bridge integration procedure. 

Availability  Installation instructions, documentations, and vignettes are available at http://www.satijalab.org/seurat

Hao Y, Stuart T, Kowalski M et al. (2022). Dictionary learning for integrative, multimodal, and scalable single-cell analysis. bioRXiv [online preprint]. [article]

Leave a Reply

Your email address will not be published. Required fields are marked *


Time limit is exhausted. Please reload CAPTCHA.