RNA-Seq measures gene expression by counting sequence reads belonging to unique cDNA fragments. Molecular barcodes commonly in the form of random nucleotides were recently introduced to improve gene expression measures by detecting amplification duplicates, but are susceptible to errors generated during PCR and sequencing. This results in false positive counts, leading to inaccurate transcriptome quantification especially at low input and single-cell RNA amounts where the total number of molecules present is minuscule.
To address this issue, Stanford University researchers demonstrated the systematic identification of molecular species using transposable error-correcting barcodes that are exponentially expanded to tens of billions of unique labels.
The researchers experimentally showed random-mer molecular barcodes suffer from substantial and persistent errors that are difficult to resolve. To assess their method’s performance, they applied it to the analysis of known reference RNA standards. By including an inline random-mer molecular barcode, they systematically characterized the presence of sequence errors in random-mer molecular barcodes. They observed that such errors are extensive and become more dominant at low input amounts.
Overview of EXB-based molecular barcoding
a Structure of the EXB adapter. The adapter consists of a paired-end Y-adapter structure followed by a 6 bp random nucleotide sequence and three rationally designed 6 bp barcode subunits separated by distinct scaffold sequences. The 6 bp barcode subunits are random combinations of 64 possible sequences as output from the linear generator matrix as shown. The Tn5 transposase recognition sequence at the end of the adapter allows for the generation of sequencing libraries via in vitro Tn5 transposition. b Edit (substitution) distance metrics for all possible 6 bp barcode pairs. Over 93% of pairwise comparisons between barcodes have an edit distance greater than 4. c Schematic of in vitro transposition of EXBs. Tn5 transposase loaded with EXB adapters are incubated with double stranded cDNA. A gap-fill repair reaction then generates paired-end EXB sequencing libraries. After PCR, EXBs are read as inline barcodes, after which the insert sequence is read. d Single-end abundance of EXBs. Single-ended EXB identities were measured by pooling one million reads of each library
These researchers describe the first study to use transposable molecular barcodes and its use for studying random-mer molecular barcode errors. Extensive errors found in random-mer molecular barcodes may warrant.