A new method developed by EMBL-EBI researchers helps to streamline nanopore sequencing in real-time
Nanopore sequencing adaptive sampling. Credit: Karen Arnott/EMBL-EBI
Long-read nanopore sequencing has revolutionised the way scientists obtain genomic data. But like any new technology, there is always room for improvement. BOSS-RUNS – which stands for Benefit-Optimising Short-term Strategy for Read Until Nanopore Sequencing – is an open source method developed by researchers at EMBL’s European Bioinformatics Institute (EMBL-EBI) and the University of Nottingham that can help scientists to dynamically adapt their nanopore sequencing runs to make the process faster and more efficient.
Nanopore sequencing allows researchers to carry out real-time sequencing of long DNA or RNA fragments. It works by monitoring changes to an electrical current as nucleic acids – the building blocks of DNA and RNA – are passed through a protein nanopore. The resulting signal is computationally decoded to give the specific DNA or RNA sequence.
A unique feature of nanopore sequencing is the ability to reject DNA fragments passing through the pore by reversing the voltage to drive DNA back out. This allows scientists to select specific DNA fragments to sequence from within a mixed sample, a feature known as adaptive sampling or ‘Read Until’. In a new study published in the journal Nature Biotechnology, scientists describe a new method – BOSS-RUNS – that streamlines adaptive sampling by helping the user make real-time dynamic decisions on what they want to sequence.
“The ability to select individual molecules to sequence in real time has always been incredibly exciting,” said Matt Loose, Professor of Developmental and Computational Biology at the University of Nottingham. “Here with EMBL-EBI we have taken a step forward by enabling dynamic selection of molecules in response to what has already been sequenced. This is, to our knowledge, the first example of dynamic adaptive sampling.”
Methodological overview of dynamic, active sampling
a, Different sites might require different levels of coverage; for example, sites lacking variation are resolved by few reads, and sites of particular interest require more. Accumulation of coverage beyond that necessary (observed coverage in gray, exceeding ideal coverage in orange) is wasteful, whereas other sites would benefit from observing more data (observed < ideal). b, Local fluctuations in the distribution of fragment origins also result in uneven coverage and reduced efficiency of sequencing. c, We quantify the genotype uncertainty at each site based on prior probabilities and data observed so far. The expected shift in uncertainty caused by observing a new read at that position is expressed as ‘positional benefit score’. d, The expected benefit of a hypothetical read starting at each location is computed as the sum of accumulated positional scores, weighted by the probability of reaching those positions, illustrated for forward and reverse reads starting at two positions. e, A Boolean decision strategy for each position instructs the sequencer to either continue sequencing (1) or reject from the pore (0) a read that starts at that position. Stages c–e are updated and iterated throughout the sequencing experiment. f, Overview of our model of the sequencing process. A novel read is acquired, and, after sequencing its initial bases, its starting position and orientation are identified, determining its fate according to the current decision strategy (e). Upon rejection (upper path), the pore is freed, a new read is acquired and the model iterates from the beginning. Conversely, upon acceptance (lower path), the molecule translocates through the pore until all of its nucleotides are read. New read acquisition and model iteration then proceed as before.
It’s not always necessary to sequence everything in a given sample. If a researcher is only interested in a specific site or region within a genome, they could limit their sequencing to that specific site. This is faster and enables researchers to prevent wasteful data acquisition and storage.
Nanopore sequencing’s adaptive sampling feature makes it possible to select in advance the molecules you wish to sequence, for example specific chromosomes or DNA from a specific species, in a complex sample. BOSS-RUNS takes advantage of this feature to help the user make these decisions based on the results they are getting in real-time. This allows for more dynamic sequencing efforts and better coverage of specific areas of the genome.
“One of the great things about nanopore sequencing is that the software is open source,” said Nick Goldman, Group Leader at EMBL-EBI. “This means that researchers can adapt their sequencing protocols, optimising them to best fit their needs. The way that sequencing devices traditionally work is quite wasteful in that they randomly sample the DNA being sequenced. This generates excessive amounts of data that isn’t always needed. Adapting the sequencing software itself can help to minimise this and save researchers time, money, and data storage.”
Detecting low-abundance species
BOSS-RUNS allows the user to delve deeper into areas of a genome based on what they see in real-time. For example, if BOSS-RUNS detects locations in a genomic sample that don’t entirely match a reference genome, it can adjust the sequencing experiment to obtain more data specific to the region in question, to confirm this genomic variation.
Similarly, BOSS-RUNS can be ideal to use when analysing multiple genomes in the same sample, for example in a microbiome. If these genomes are from different species and present at different abundances, using this method will help researchers collect sufficient information on all the species present. BOSS-RUNS does this by informing itself in real-time about which species have already been sequenced and using this information to reject redundant DNA moving through the pore.
“We used BOSS-RUNS to analyse the species present in a mixed microbial community to show that the method can help researchers gain higher coverage depth of low-abundance species,” said Lukas Weilguny, Predoctoral Fellow at EMBL-EBI. “In a sample of mixed microbial species you may find that 90% of that sample is all the same species but you still want information on the species that make up perhaps only 1% or less. BOSS-RUNS figures out which species the sequencing reads come from in real-time and refocuses the experiment on species that haven’t been covered in as much depth.”
Availability – BOSS-RUNS is implemented in python and available in GitHub.: https://github.com/goldman-gp-ebi/BOSS-RUNS