SARS-CoV-2 genotyping has been instrumental to monitor virus evolution and transmission during the pandemic. The reliability of the information extracted from the genotyping efforts depends on a number of aspects, including the quality of the input material, applied technology and potential laboratory-specific biases. These variables must be monitored to ensure genotype reliability. The current lack of guidelines for SARS-CoV-2 genotyping leads to inclusion of error-containing genome sequences in studies of viral spread and evolution.
A team led by researchers at SOPHiA Genetics used clinical samples and synthetic viral genomes to evaluate the impact of experimental factors, including viral load and sequencing depth, on correct sequence determination using an amplicon-based approach. They found that at least 1000 viral genomes are necessary to confidently detect variants in the genome at frequencies of 10% or higher. The broad applicability of our recommendations was validated in >200 clinical samples from six independent laboratories. The genotypes of clinical isolates with viral load above the recommended threshold cluster by sampling location and period. This analysis also supports the rise in frequency of 20A.EU1 and 20A.EU2, two recently reported European strains whose dissemination was favoured by travelling during the summer 2020.
The researchers present much-needed recommendations for reliable determination of SARS-CoV-2 genome sequence and demonstrate their broad applicability in a large cohort of clinical samples.
Schematic representation of amplicon-based workflow and the source and type of artifacts it can introduce in the sequencing data
The RNA is converted into cDNA and amplified, in two pools, into 343 partially overlapping amplicons (example primer design F1-R1 and F2-R2 shown). After pooling, the amplicons are subsequently converted into sequencing libraries. These two PCR steps can introduce different types of artifacts as illustrated by the panel on the right.