Quantitative analysis of next-generation sequencing (NGS) data requires discriminating duplicate reads generated by PCR from identical molecules that are of unique origin. Typically, PCR duplicates are identified as sequence reads that align to the same genomic coordinates using reference-based alignment. However, identical molecules can be independently generated during library preparation. Misidentification of these molecules as PCR duplicates can introduce unforeseen biases during analyses.
Researchers at New York University have developed a cost-effective sequencing adapter design by modifying Illumina TruSeq adapters to incorporate a unique molecular identifier (UMI) while maintaining the capacity to undertake multiplexed, single-index sequencing. Incorporation of UMIs into TruSeq adapters (TrUMIseq adapters) enables identification of bona fide PCR duplicates as identically mapped reads with identical UMIs. Using TrUMIseq adapters, the researchers show that accurate removal of PCR duplicates results in improved accuracy of both allele frequency (AF) estimation in heterogeneous populations using DNA sequencing and gene expression quantification using RNA-Seq.
Accurate detection of PCR duplicates using TrUMIseq adapters
(A) TrUMIseq adapters are based on TruSeq adapters, with relocation of the sample index and addition of a unique molecular identifier (UMI). Libraries are generated and sequenced with TrUMIseq adapters using the identical ligation, PCR, and sequencing primers and protocols currently used for TruSeq adapters in either paired-end (PE) or single-end (SE) sequencing mode. After Step II, the two complementary strands of a double-stranded cDNA molecule will be barcoded with two different UMIs and sequenced as independent reads. When using a strand-specific RNA-Seq protocol, one of the cDNA strands is destroyed prior to PCR amplification. (B) Removal of PCR duplicates using TrUMIseq adapters. Whereas coordinate-based deduplication depends on mapping information only, the use of UMIs enables distinction between true PCR duplicates that have identical UMIs (red star) from independently generated molecules that have different UMIs (yellow star). (C) Comparison between TrUMIseq adapters and possible alternative configurations of UMIs and sample indices potentially compatible with single-index TruSeq workflows. TrUMIseq adapters can be easily incorporated into any single-index TruSeq protocol without requiring either specialized methods for preparing adapters or specialized sequencing steps