Nanopore sequencing enables direct measurement of RNA molecules without conversion to cDNA, thus opening the gates to a new era for RNA biology. However, the lack of molecular barcoding of direct RNA nanopore sequencing data sets severely affects the applicability of this technology to biological samples, where RNA availability is often limited.
Researchers from the Garvan Institute of Medical Research provide the first experimental protocol and associated algorithm to barcode and demultiplex direct RNA nanopore sequencing data sets. Specifically, the researchers present a novel and robust approach to accurately classify raw nanopore signal data by transforming current intensities into images or arrays of pixels, followed by classification using a deep learning algorithm. They demonstrate the power of this strategy by developing the first experimental protocol for barcoding and demultiplexing direct RNA sequencing libraries. Their method, DeePlexiCon, can classify 93% of reads with 95.1% accuracy or 60% of reads with 99.9% accuracy. The availability of an efficient and simple multiplexing strategy for native RNA sequencing will improve the cost-effectiveness of this technology, as well as facilitate the analysis of lower-input biological samples. Overall, this work exemplifies the power, simplicity, and robustness of signal-to-image conversion for nanopore data analysis using deep learning.
Direct RNA barcoding and demultiplexing
(A) Overview of Oxford Nanopore sample preparation protocol for native RNA sequencing. (B) Adaptation of (A) to include custom DNA barcodes. (C) Barcode segmentation and transformation, where the electric current associated with a barcode adapter (highlighted in red) is extracted and converted into an image using GASF transformation. (D) Deep learning is used to classify the segmented and GASF-transformed squiggle signals into their corresponding bins, without the need of base-calling the underlying sequence. The convolution architecture of the final residual neural network classifier (ResNet-20) described in this work: FC = Fully Connected layer.
Availability – Code to demultiplex direct RNA reads, including example FAST5 data, can be found at: https://github.com/Psy-Fer/deeplexicon.