Although long-read single-cell RNA isoform sequencing (scISO-Seq) can reveal alternative RNA splicing in individual cells, it suffers from a low read throughput. Researchers at Sun Yat-sen University have developed HIT-scISOseq, a method that removes most artifact cDNAs and concatenates multiple cDNAs for PacBio circular consensus sequencing (CCS) to achieve high-throughput and high-accuracy single-cell RNA isoform sequencing. HIT-scISOseq can yield >10 million high-accuracy long-reads in a single PacBio Sequel II SMRT Cell 8M. The researchers also report the development of scISA-Tools that demultiplex HIT-scISOseq concatenated reads into single-cell cDNA reads with >99.99% accuracy and specificity. They apply HIT-scISOseq to characterize the transcriptomes of 3375 corneal limbus cells and reveal cell-type-specific isoform expression in them. HIT-scISOseq is a high-throughput, high-accuracy, technically accessible method and it can accelerate the burgeoning field of long-read single-cell transcriptomics.
Overview of the workflow and the performance of HIT-scISOseq
a Schematic diagram of the experimental steps of HIT-scISOseq, consisting the following steps: (1) Single-cell cDNA library construction; (2) cDNAs amplification via PCR with a biotinylated primer at their 3′ ends; (3) Biotinylated cDNAs enrichment with streptomycin magnetic beads; (4) USER enzyme digestion to produce sticky ends and multi-cDNA fragment ligation; (5) SMRTbell library preparation and sequencing. b Comparison of the percentages of artifact reads between ScISOr-Seq (blue) and HIT-scISOseq (red); either method includes two biological replicates (s1 and s2). c Comparison on the number of mapped FLNC reads between ScISOr-Seq (blue) and HIT-scISOseq (red). d Comparison of the sequence quality between ScISOr-Seq (blue, s1 n = 1,582,427, s2 n = 1,271,713) and HIT-scISOseq (red, s1 n = 2,870,070, s2 n = 3,506,141). The center line: median; boxes: first and third quartiles; whiskers: 5th and 95th percentiles. e, f Comparison of the FLNC lengths (e) and number of FLNC per CCS (f) between ScISOr-Seq and HIT-scISOseq. g Distributions of gene counts (x-axis) and UMI counts (y-axis) for ScISOr-Seq and HIT-scISOseq. The box plots (s1 ScISOr-Seq n = 1658, s2 ScISOr-Seq n = 1408, s1 HIT-scISOseq n = 1776, s2 HIT-scISOseq n = 1599, the center line: median; boxes: first and third quartiles; whiskers: 5th and 95th percentiles.) and density plots are shown on the top and to the right of the scatter graph. Source data are provided as a Source Data file.
Availability – The HIT-scISOseq analysis pipeline and source code are available from https://github.com/shizhuoxing/scISA-Tools.