Transcriptional mutagenesis (TM) due to misincorporation during RNA transcription can result in mutant RNAs, or epimutations, that generate proteins with altered properties. TM has long been hypothesized to play a role in aging, cancer, and viral and bacterial evolution. However, inadequate methodologies have limited progress in elucidating a causal association. Researchers from the University of Washington School of Medicine present a high-throughput, highly accurate RNA sequencing method to measure epimutations with single-molecule sensitivity. Accurate RNA consensus sequencing (ARC-seq) uniquely combines RNA barcoding and generation of multiple cDNA copies per RNA molecule to eliminate errors introduced during cDNA synthesis, PCR, and sequencing. The stringency of ARC-seq can be scaled to accommodate the quality of input RNAs. The researchers apply ARC-seq to directly assess transcriptome-wide epimutations resulting from RNA polymerase mutants and oxidative stress.
(A) Overview of the ARC-seq method. (i) Each RNA is ligated to an adaptor containing a unique barcode. Ligated RNAs are then circularized (ii) and subjected to rolling-circle reverse transcription (iii), generating a multimeric cDNA from each RNA molecule. (iv) cDNA multimers are then restricted into monomers, which are cDNA copies of the original RNA molecule. Each cDNA is then tagged with a unique index (v), amplified (vi), and sequenced. (B) Error correction by ARC-seq. (i) Single RNA molecule containing a true epimutation (red); this molecule is barcoded. (ii) Rolling-circle reverse transcription generates multiple cDNA copies from each ligated RNA molecule, introducing random errors (orange). (iii) Amplification and sequencing amplify the existing errors and introduce new errors (purple), further obscuring the true epimutation. Artifacts present in standard RNAseq data are illustrated at this level. (iv) After sequencing, cDNA tags are bioinformatically matched and a consensus sequence is generated for each cDNA copy, eliminating many amplification and sequencing artifacts. (v) Finally, the RNA barcodes are matched, and a consensus sequence is generated from the cDNA copies, which regenerates the original RNA molecule’s sequence, revealing the true epimutation. (C) ARC-seq eliminates damage-induced, reverse transcription, PCR, and sequencing artifacts, revealing true epimutations. High-fidelity (blue), damaged (green), and mutated (purple) RNAs were generated by in vitro transcription by T7 RNA polymerase and sequenced via ARC-seq. While conventional RNAseq has a high level of artifacts, with increased artifacts observed in the damaged RNA template, ARC-seq is able to fully correct damage-induced artifacts, revealing the true epimutation frequency to be ∼2 × 10−5, without removing true epimutations. Error bars represent Wilson scores of 95% confidence.