Alternative RNA isoforms are defined by promoter choice, alternative splicing, and polyA site selection. Although differential isoform expression is known to play a large regulatory role in eukaryotes, it has proved challenging to study with standard short-read RNA-seq because of the uncertainties it leaves about the full-length structure and precise termini of transcripts. The rise in throughput and quality of long-read sequencing now makes it possible, in principle, to unambiguously identify most transcript isoforms from beginning to end. However, its application to single-cell RNA-seq has been limited by throughput and expense.
University of California, Irvine researchers have developed and characterized long-read Split-seq (LR-Split-seq), which uses a combinatorial barcoding-based method for sequencing single cells and nuclei with long reads. The researchers show that LR-Split-seq can associate isoforms with cell types with relative economy and design flexibility. They characterized LR-Split-seq for whole cells and nuclei by using the well-studied mouse C2C12 system in which mononucleated myoblast cells differentiate and fuse into multinucleated myotubes. They show that the overall results are reproducible when comparing long- and short-read data from the same cell or nucleus. They found substantial evidence of differential isoform expression during differentiation including alternative transcription start site (TSS) usage. The researchers integrated the resulting isoform expression dynamics with snATAC-seq chromatin accessibility to validate TSS-driven isoform choices. LR-Split-seq provides an affordable method for identifying cluster-specific isoforms in single cells that can be further quantified with companion deep short-read scRNA-seq from the same cell populations.
Technical comparisons in LR-Split-seq and bulk long-read RNA-seq
a, Schematic diagram of experimental design. Single cell/nucleus LR-Split-seq, bulk longread RNA-seq, and single nucleus ATAC-seq were performed on C2C12 0hr myoblasts and 72hr differentiating cells. The same single-cell/UMI-barcoded cDNA was used in both short-read and long-read sequencing. b, Kernel density estimation (KDE) of read length distribution of oligo-dT primed reads (blue) compared to random hexamer primed reads (orange). c, Proportion of oligo-dT/random hexamer reads in each cell for each novelty category. d, Comparison of number of reads and e, genes detected between short and long reads. Cells are labeled by sample type (0hr cells in pink, 0hr nuclei in blue, and 72hr nuclei in green) and marginals on the top and right indicate their distributions. f, KDE read length distribution of 0hr cells (pink) compared to 0hr nuclei (blue) reads, not including genomic reads. g, Proportion of 0hr cell (pink)/nuclei (blue) reads per cell/nucleus per novelty category. h, KDE read length distribution of bulk long reads (yellow) compared to single-cell long reads (magenta), not including genomic reads. i, Unfiltered reads per novelty category in bulk long-read data and j, LR-Split-seq data. k, Filtered isoforms per novelty category across all cells in LR-Split-seq data.