The massive output of modern short-read next generation sequencing (NGS) platforms allows to multiplex hundreds of samples to be processed in one NGS run. Each sample is identified by adding a unique index sequence to an NGS library, ensuring these libraries can be mixed and sequenced collectively.
This system reduces sequencing costs tremendously but is affected by “Index Sequence Errors” introduced during library preparation and sequencing steps. A small but significant share of these errors converts one index into another one, that is used in the same NGS mix. Similarly, “Index Hopping”, a process occurring during library amplification on Illumina platforms, can incorrectly assign a library from the original index to another index. To avoid the ensuing confusion of reads that can affect study results significantly, Unique Dual Indices (UDI) with exclusive i5 and i7 sequences are used. This assures the identification of non-expected index combinations, and the associated reads are removed from downstream analysis.
However, most Index Sequence Errors cause reads to be “non-determinable” by converting the index sequence into one that is not present in the library pool. While no mis-assignment happens, these errors cause up to 9% of the initial reads to be removed from downstream analysis, severely reducing the overall NGS run efficiency. This issue can be addressed, if the index sequence in question is different enough from the other index sequences in this pool. Then, error correction can be applied to recover the vast majority of these reads. The performance of this error correction depends predominantly on the quality of the index design, and deficient index design can result in a higher rate of faulty error correction.
Lexogen has therefore developed a sophisticated 12 nt Unique Dual Index System that covers all indexing aspects by incorporating
- Unique Dual Indexing
- scalable index read-out lengths of 8, 10, or 12 nucleotides (nt)
- highest index sequence distance for any multiplexing setup, from smallest to largest (384+) sample sets
- superior error correction with lowest read mis-assignment
For example, Lexogen’s 12 nt read-out setup with correction of up to 2 errors makes 97.8% of the initial reads in a 96-libraries multiplex run on a NextSeq 500 available for downstream data analysis by recovering a remarkable 6.9% of initial reads with Index Sequence Errors. Due to the proprietary nested index sequence design the same NGS run with error-corrected 8 nt read-out yields a similar number of reads for downstream analyses (97.6%). While this comes with a moderate increase of corrected reads being mis-assigned, established data analysis pipelines for 8 nt indexes can be used, and sequencing reagents can be saved. The 10 nt and 12 nt read-out lengths excel, when multiplexing of more than 96 libraries is required. These longer indices provide 384 and more UDIs with highest indexing quality, optimal for the latest generation of short-read sequencers such as the Illumina NovaSeq™ series that produce billions of reads in one run.
The nested design also provides optimized UDI sets for pooling of 4, 8, 16, 24, 96, 384, etc. libraries, relieving the user from having to select appropriate UDI sets. In total, Lexogen has designed more than 9,261 UDIs (24 sets with 384 UDIs each) with the capacity to correct at least one error. All samples of even very large studies can thereby be barcoded uniquely, avoiding any chance to be mixed-up.
Summarized, Lexogen’s 12 nt UDI system adapts to the user’s needs while always providing highest inter-index distance and maximal error correction capacity. Read mis-assignment due to Index Hopping is avoided, and Index Sequence Errors can be corrected with highest accuracy and minimal mis-assignment trade-off. Thereby, the system provides the optimized indexing solution for current and future barcoding requirements.
Source – PRNewswire