One of the key challenges in the field of genetics is the inference of haplotypes from next generation sequencing data. The MinION Oxford Nanopore sequencer allows sequencing long reads, with the potential of sequencing complete genes, and even complete genomes of viruses, in individual reads. However, MinION suffers from high error rates, rendering the detection of true variants difficult. Here, researchers from Tel-Aviv University propose a new statistical approach named AssociVar, which differentiates between true mutations and sequencing errors from direct RNA/DNA sequencing using MinION. This strategy relies on the assumption that sequencing errors will be dispersed randomly along sequencing reads, and hence will not be associated with each other, whereas real mutations will display a non-random pattern of association with other mutations. The researchers demonstrate their approach using direct RNA sequencing data from evolved populations of the MS2 bacteriophage, whose small genome makes it ideal for MinION sequencing. AssociVar inferred several mutations in the phage genome, which were corroborated using parallel Illumina sequencing. This allowed us to reconstruct full genome viral haplotypes constituting different strains that were present in the sample. This approach is applicable to long read sequencing data from any organism for accurate detection of bona fide mutations and inter-strain polymorphisms.
Mutation frequencies in two MS2 phage populations evolved over 15 passages,
based on Illumina sequencing
Shown are the frequencies of mutations that were detected at a frequency >10% in passage 15 in either of the two replicate populations. Mutations are spread horizontally for visual clarity only.
Availability – The accompanying code can be found at https://github.com/SternLabTAU/AssociVar.