In a research breakthrough, a team of researchers from the Cancer Science Institute of Singapore (CSI Singapore) at the National University of Singapore has developed a software that can help reveal the relationships between RNA modifications and the development of diseases and disorders.
Led by Professor Daniel Tenen and Dr Henry Yang, the scientists devised ModTect – a new computational software that can identify RNA modifications using pre-existing sequencing data from clinical cohort studies. With ModTect, the team carried out their own novel pan-cancer study covering 33 different cancer types. They found associations between these RNA modifications and the different survival outcomes of cancer patients.
“This work is one of few studies demonstrating the association of mRNA modification with cancer development. We show that the epitranscriptome was dysregulated in patients across multiple cancer types and was additionally associated with cancer progression and survival outcomes,” explained Dr Henry Yang, Research Associate Professor from CSI Singapore.
“In the past decade, the ability to sequence the Human Genome has transformed the study of normal processes and diseases such as cancer. We anticipate that studies like this one, eventually leading to complete sequencing of RNA and detecting modifications directly in RNA, will also have a major impact on the characterisation of disease and lead to novel therapeutic approaches,” commented Prof Tenen, Senior Principal Investigator from CSI Singapore.
What are RNA modifications?
While most people are familiar with DNA, RNA plays just as much of a vital role in the human body’s cellular functions. Unlike DNA, which has the double-helix structure that most people are familiar with, RNA is a family of single-stranded molecules that perform various essential biological roles.
For example, messenger RNA (mRNA) conveys genetic information that directs the production of different proteins. Imagine DNA as an expansive library filled with books that carry instructions on how to make different proteins. Each letter in the sequences of words that make up the books’ contents are called nucleotides, which are small molecules that are used to store genetic information. To make sure these instructions are followed, mRNA makes copies of the books and carries them from a cell’s nucleus, where DNA is stored, to the ribosomes. These ribosomes are the “factories” where proteins are synthesised. Without RNA, the valuable genetic instructions stored in our cells would never be used.
Additional types of RNA perform other important functions. Some help catalyse biochemical reactions, just like enzymes, while others regulate gene expression.
Small chemical modifications to RNA can sometimes occur and alter the function and stability of the molecules. The study of these modifications and their effects is called ‘epitranscriptomics’. Research in the past has suggested a link between the development of diseases like Alzheimer’s disease and cancer with certain RNA modifications. However, despite multiple attempts to study these associations in deeper detail, the study of epitranscriptomes has proven to be difficult until this breakthrough by scientists from CSI Singapore.
In large patient cohorts, collecting and processing patient samples is challenging. Detecting RNA modifications often involves technically complex processes, such as treating the samples with chemicals that are difficult to access. These techniques often also require the use of large quantities of sample that are hard to obtain for rarer conditions. Because of this, scientists have been limited in their capacity to establish relationships between specific RNA modifications and various human diseases.
Software makes epitranscriptomics easier
The software that the CSI Singapore team created uses RNA sequences available from other large clinical cohort studies. To detect modifications in these RNA sequences, ModTect looks for mismatch signals and deletion signals. Mismatch signals arise when the experimental enzymes scientists use to turn RNA back into DNA incorporates random nucleotides during sequencing. Deletion signals, on the other hand, are when the enzymes sometimes skip a portion of the sequence. Together, these signals are referred to as ‘misincorporation signals’.
Unlike other models, ModTect does not require a database of misincorporation signal profiles corresponding to different types of RNA modifications to identify or classify them. ModTect can even identify new signal profiles that drastically differ from what has been previously recorded.
By applying the software to around 11,000 cancer patient RNA-sequencing datasets, the CSI Singapore team was able to embark on a novel study that investigated the associations between RNA modifications and clinical outcomes in patients. ModTect was able to utilise these large datasets and process them with robust statistical filtering. It unveiled that some types of epitranscriptome were associated with cancer progression and survival outcomes in patients. This finding highlighted the potential use of RNA modifications as biomarkers – molecules that can be used to test for diseases.
ModTect enables the discovery of multiple types of RNA modifications by standard RNA-seq
(A) Schematic depicting how a type of base pair–disrupting RNA modification [3-methyluridine (m3U)] with an added chemical moiety disrupts Watson-Crick base pairing. (B) RNA modifications that disrupt base pairing cause the misincorporation of nucleotides, thus generating a multinucleotide mismatch pattern, and cause skipping of the modified base to produce a deletion signature during reverse transcription. (C) Detection of a multinucleotide mismatch and deletion signal at three different types of base pair–disrupting rRNA modifications. Top: Screenshot depicting the multinucleotide mismatch and deletion signature detected by RNA-seq but not by whole genome DNA sequencing (DNA-seq) at the m3U site. Middle: Percentage of each type of nucleotide and deletions observed from DNA sequencing and RNA-seq at three different types of base pair–disrupting RNA modification sites, N1-methyladenosine (m1A) at 28S:1322 rRNA, m3U at 28S:4530 rRNA, and 3-(3-amino-3-carboxypropyl) pseudo-uridine (m1acp3Ψ) at 18S:1248 rRNA. Depth of sequencing is indicated at the top of the chart. Bottom: Mismatch rate, deletion rate, and the type of mismatches observed at sites corresponding to each type of base pair–disrupting RNA modification. (D) Performance of ModTect in identifying base pair–disrupting RNA modifications on ribosomal RNAs (rRNAs). Left: ModTect allows effective extraction of the multinucleotide mismatch and deletion signals at an RNA modification site from RNA-seq. Multinucleotide mismatch signal represented by the modification score was extracted using a statistical model we designed, without the need for DNA sequencing. The deletion signal around each RNA modification is also depicted. Right: Precision-Recall curve, generated on the basis of rRNA modification sites from 934 RNA-seq datasets. The area under the precision-recall curve for each approach is indicated in the table.
Unravelling the mystery of sequence differences that escape detection
As explored before, the transmission of genetic information from DNA in a cell’s nucleus to RNA molecules that carry it to a cell’s ribosomes is a critical process. However, this transmission process is not perfect and leads to differences in RNA-DNA sequences. The sites of these mismatches have been widely documented. However, it is unclear whether these observations are caused by modifications in mRNA and why these sites have escaped detection by Sanger sequencing (one of the most popular methods of DNA sequencing).
The group at CSI Singapore uncovered a potential explanation as to why these RNA modification signals have eluded detection over the years. They explained how some epitranscriptomes impede the use of standard reverse transcriptase (RT), the enzyme that is used to convert RNA into DNA. This enzyme is used by scientists in genome sequencing and its use is one of the most critical steps for experimental success. Hence, RNAs that had these impeding modifications were under-represented in Sanger sequencing techniques.
To combat this, the team used newly developed RT enzymes that have been known for their ability to bypass the effects of these modification sites. This allowed them to observe epitranscriptomes that were originally undetectable with Sanger sequencing.
The discipline of epitranscriptomics is still an emerging and rapidly developing field with around 170 RNA modifications being detected so far. By harnessing ModTect, Prof Tenen and his team were able to provide novel insights into the relationships between human diseases – like cancer – and such RNA modifications.
The team is hopeful that their contribution will help further research that establishes any potential causal or mechanistic relationships between RNA modifications and tumour formation.
Availability – The software is publicly available on Github: https://github.com/ktan8/ModTect
Source – National University of Singapore