High-throughput single-cell RNA-seq methods assign limited unique molecular identifier (UMI) counts as gene expression values to single cells from shallow sequence reads and detect limited gene counts. RIKEN institute researchers have developed a high-throughput single-cell RNA-seq method, Quartz-Seq2, to overcome these issues. Their improvements in the reaction steps make it possible to effectively convert initial reads to UMI counts, at a rate of 30-50%, and detect more genes. To demonstrate the power of Quartz-Seq2, the developers analyzed approximately 10,000 transcriptomes from in vitro embryonic stem cells and an in vivo stromal vascular fraction with a limited number of reads.
Overview of Quartz-Seq2 experimental processes
a Quartz-Seq2 consists of five steps. (1) Each single cell in a droplet is sorted into lysis buffer in each well of a 384-well PCR plate using flow cytometry analysis data. (2) Poly-adenylated RNA in each well is reverse-transcribed into first-strand cDNA with reverse transcription primer, which has a unique cell barcode (CB). We prepare 384 or 1536 kinds of cell barcode with a unique sequence based on the Sequence–Levenshtein distance (SeqLv). The edit distance of SeqLv is 5. The RT primer also has a UMI sequence for reduction of PCR bias (MB) and a poly(dT) sequence for binding to poly(A) RNA. (3) Cell barcode-labeled cDNAs from all 384 wells are promptly collected by centrifugation using assembled collectors. (4) Collected first-strand cDNAs are purified and concentrated for subsequent whole-transcript amplification. In the poly(A) tailing step, purified cDNA is extended with a poly(A) tail by terminal deoxynucleotidyl transferase (TdT). Subsequently, second-strand cDNA is synthesized with a tagging primer, which has a poly(dT) sequence. The resulting second-strand cDNA has a PCR primer sequence (M) at both ends of it. The cDNA is amplifiable in a subsequent PCR amplification. (5) For conversion from amplified cDNA to sequence library DNA, we fragment the amplified cDNA using the ultrasonicator Covaris. Such fragmented cDNA is ligated with a truncated Y-shaped sequence adaptor, which has an Illumina flow-cell binding sequence (P7) and a pool barcode sequence (PB). The PB makes it possible to mix different sets of cell barcode-labeled cDNA. Ligated cDNA, which has CB and MB sequences, is enriched by PCR amplification. The resulting sequence library DNA contains P7 and P5 flow-cell binding sequences at respective ends of the DNA. We sequence the cell barcode site and the UMI site at Read1, the pool barcode site at Index1, and the transcript sequence at Read2. b The relationship between initial fastq reads and the number of single cells for sequence analysis in NextSeq500 runs. Typically, one sequence run with NextSeq 500/550 High Output v2 Kit reads out 400–450 M fastq reads. The x-axis represents the input cell number for one sequence run. The y-axis represents the initial data size (fastq reads) on average per cell. The red outline represents the typical range of shallow input read depth for a single cell. c We define the formula for calculating the UMI conversion efficiency. Each parameter is defined as follows: UMI sc is the number of UMI counts, assigned to a single-cell sample, fastq sc is the number of fastq reads derived from each single-cell sample, fastq non-sc is the number of fastq reads derived from non-single-cell samples, which include experimental byproducts such as WTA adaptors, WTA byproducts, and non-STAMPs. Initial fastq reads are composed of fastq sc and fastq non-sc
Availability – A custom-made python program (correct_bacode.py) has been deposited under GitHub (DOI: https://doi.org/10.5281/zenodo.1118151, A full tutorial is available at: https://bit.riken.jp/protocols/quartz-seq/