Single-cell RNA sequencing (scRNA-seq) methods generate sparse gene expression profiles for thousands of single cells in a single experiment. The information in these profiles is sufficient to classify cell types by distinct expression patterns but the high complexity of scRNA-seq libraries often prevents full characterization of transcriptomes from individual cells.
To extract more focused gene expression information from scRNA-seq libraries, researchers from the University of Colorado School of Medicine developed a strategy to physically recover the DNA molecules comprising transcriptome subsets, enabling deeper interrogation of the isolated molecules by another round of DNA sequencing. They applied the method in cell-centric and gene-centric modes to isolate cDNA fragments from scRNA-seq libraries. First, they resampled the transcriptomes of rare, single megakaryocytes from a complex mixture of lymphocytes and analyzed them in a second round of DNA sequencing, yielding up to 20-fold greater sequencing depth per cell and increasing the number of genes detected per cell from a median of 1313 to 2002. The researchers similarly isolated mRNAs from targeted T cells to improve the reconstruction of their VDJ-rearranged immune receptor mRNAs. Second, they isolated CD3D mRNA fragments expressed across cells in a scRNA-seq library prepared from a clonal T cell line, increasing the number of cells with detected CD3D expression from 59.7% to 100%. Transcriptome resampling is a general approach to recover targeted gene expression information from single-cell RNA sequencing libraries that enhances the utility of these costly experiments, and may be applicable to the targeted recovery of molecules from other single-cell assays.
Resampling specific cell transcriptomes from pooled single-cell RNA-seq libraries
(A) Resampling single cell libraries from rare cell populations to enable deeper characterization of a targeted cell type. (B) Schematic of Locked Nucleic Acid (LNA) hybridization-based approach to enrich mRNAs from targeted cells from 10X Genomics single-cell mRNA-seq libraries. (C) Species specificity of cells recovered from a 10X Genomics 3′ end gene expression scRNA-seq library containing a 1:1 mix of mouse (NIH-3T3) and human (293T) cells. Orange dots and arrows (n = 2) indicate cells selected for resampling; blue dots (n = 1505) are untargeted cells. (D) Species specificity of cells from the resampled library. Colors are are the same as in C. (E) Enrichment of targeted libraries after resampling. The y-axis plots the log2 enrichment of UMIs normalized by the size of the entire scRNA-seq library. Colors are the same as in C. (F) Sequencing saturation, as defined by 1 minus the ratio of the number of UMIs to the number of reads, per cell for resampled and untargeted cells in the original scRNA-seq library or after resampling. Colors are the same as in C. (G) Number of genes (left) or UMIs (right) in the resampled cells that are either newly detected by resampling (orange), previously detected in the original library (blue), or previously detected in the original library but not found after resampling (green).
Availability – The data analysis pipeline and custom scripts in a github repository https://github.com/rnabioco/scrna-subsets