The study of RNA modifications in large clinical cohorts can reveal relationships between the epitranscriptome and human diseases, although this is especially challenging. Researchers from the National University of Singapore have developed ModTect, a statistical framework to identify RNA modifications de novo by standard RNA-sequencing with deletion and mis-incorporation signals. The researchers show that ModTect can identify both known (N 1-methyladenosine) and previously unknown types of mRNA modifications (N 2,N 2-dimethylguanosine) at nucleotide-resolution. Applying ModTect to 11,371 patient samples and 934 cell lines across 33 cancer types, they show that the epitranscriptome was dysregulated in patients across multiple cancer types and was additionally associated with cancer progression and survival outcomes. Some types of RNA modification were also more disrupted than others in patients with cancer. Moreover, RNA modifications contribute to multiple types of RNA-DNA sequence differences, which unexpectedly escape detection by Sanger sequencing. ModTect can thus be used to discover associations between RNA modifications and clinical outcomes in patient cohorts.
ModTect enables the discovery of multiple types of RNA modifications by standard RNA-seq
(A) Schematic depicting how a type of base pair–disrupting RNA modification [3-methyluridine (m3U)] with an added chemical moiety disrupts Watson-Crick base pairing. (B) RNA modifications that disrupt base pairing cause the misincorporation of nucleotides, thus generating a multinucleotide mismatch pattern, and cause skipping of the modified base to produce a deletion signature during reverse transcription. (C) Detection of a multinucleotide mismatch and deletion signal at three different types of base pair–disrupting rRNA modifications. Top: Screenshot depicting the multinucleotide mismatch and deletion signature detected by RNA-seq but not by whole genome DNA sequencing (DNA-seq) at the m3U site. Middle: Percentage of each type of nucleotide and deletions observed from DNA sequencing and RNA-seq at three different types of base pair–disrupting RNA modification sites, N1-methyladenosine (m1A) at 28S:1322 rRNA, m3U at 28S:4530 rRNA, and 3-(3-amino-3-carboxypropyl) pseudo-uridine (m1acp3Ψ) at 18S:1248 rRNA. Depth of sequencing is indicated at the top of the chart. Bottom: Mismatch rate, deletion rate, and the type of mismatches observed at sites corresponding to each type of base pair–disrupting RNA modification. (D) Performance of ModTect in identifying base pair–disrupting RNA modifications on ribosomal RNAs (rRNAs). Left: ModTect allows effective extraction of the multinucleotide mismatch and deletion signals at an RNA modification site from RNA-seq. Multinucleotide mismatch signal represented by the modification score was extracted using a statistical model we designed, without the need for DNA sequencing. The deletion signal around each RNA modification is also depicted. Right: Precision-Recall curve, generated on the basis of rRNA modification sites from 934 RNA-seq datasets. The area under the precision-recall curve for each approach is indicated in the table.
Availability – ModTect is available at: https://github.com/ktan8/ModTect