Chemical RNA modifications, collectively referred to as the ‘epitranscriptome’, are essential players in fine-tuning gene expression. Our ability to analyze RNA modifications has improved rapidly in recent years, largely due to the advent of high throughput sequencing methodologies, which typically consist of coupling modification-specific reagents, such as antibodies or enzymes, to next-generation sequencing. Recently, it also became possible to map RNA modifications directly by sequencing native RNAs using nanopore technologies, which has been applied for the detection of a number of RNA modifications, such as N6-methyladenosine (m6A), pseudouridine (Ψ) and inosine (I). However, the signal modulations caused by most RNA modifications have yet to be determined. A global effort is needed to determine the signatures of the full range of RNA modifications to avoid the technical biases that have so far limited our understanding of the epitranscriptome.
An overview of using direct RNA sequencing to detect RNA modifications
(A) After performing RNA library preparation (ligation of the RNA to a helicase-containing DNA adapter, plus optional reverse transcription step), RNA molecules can translocate through the nanopore embedded on a membrane with the help of a helicase protein, at an approximate speed of 70 bases per second (under the library preparation kits: SQKRNA001 and SQK-RNA002). RNA translocation disrupts the current created by the ion flow that is passing through the nanopore. The current intensity information is acquired and processed using MinKnow software, which generates Fast5 files containing the current intensity information of each identified read (1 Fast5 file will contain up to 4,000reads). Fast5 files can then be base-called using base-calling algorithms (e.g. Guppy, Bonito) which are fed with a base-calling model, generating FastQ files as output. Fastq files can be mapped to a reference sequence by using an alignment tool (e.g. minimap2), which creates a BAM file. Modification information can be stored in BAM files using specific tags. (B) Schematic overview of three major strategies that can be used to detect RNA modifications in direct RNA nanopore sequencing data. A first strategy consists in identifying RNA modifications in the form of non-random base-calling ‘errors’, which can be seen in the form of increased base-calling ‘errors’ (mismatch, insertion and deletion) at the modified site (position 0) and/or surrounding positions (left panel). In this strategy, the use of knockout and/or knockout strains allows distinguishing ‘errors’ caused by the presence of RNA modifications from those that are intrinsic to the sequencing and or base-calling itself (i.e. background ‘error’). A second strategy involves using raw current intensity (signal intensity, dwell time and/or trace features) to identify positions with altered current intensity values, when comparing two strains (e.g. wild type and knockout) or when comparing reads within a given sample (middle panel). A third strategy consists in using a modification-aware basecalling model (instead of the canonical model that predicts 4 letters) when performing the base-calling step (right panel). This approach requires generating training sets that can be used to train the modification-aware basecalling model.