A necessary pre-processing data analysis step is the removal of adapter sequences from the raw reads. While most adapter trimming tools require adapter sequence as an essential input, adapter information is often incomplete or missing. This can impact quantification of features, reproducibility of the study and might even lead to erroneous conclusions. University of Oslo researchers provide examples to highlight the importance of specifying the adapter sequence by demonstrating the effect of using similar but different adapter sequences and identify additional potential sources of errors in the adapter trimming step. Finally, they propose solutions by which users can ensure their small RNA-seq data is fully annotated with adapter information.
Use of incorrect adapter sequence or trimming protocol can lead to incorrectly trimmed reads and miscounting of reads mapping to features. The selected datasets are originally from , samples containing six synthetic small RNAs prepared by the NEBNext and CATS kits. After trimming, the Linux command grep -c was used for counting. (A) Top part of figure: Schematic of the different but highly similar adapter sequences used in (B–D). Bottom section of figure: Different versions of the CATS manual used for trimming protocol applied in (D). Legend under (A) Fill colour corresponds to the six synthetic RNAs in the dataset, line colour corresponds to the two replicates for each sample. Sequence for oligonucleotides are listed in Supplementary Table S5. (B) Choice of adapter sequence can have a major impact on downstream analysis. Left: Use of the correct adapter sequence (NEBNext_trim01) identifies the presence of 5 out of 6 synthetic small RNAs present in the NGS dataset. Middle and right (NEBNext_trim02 and NEBNext_trim03): Using a highly similar adapter sequence that differs by one or two nucleotides has a drastic effect on mapped reads with less than 1% of reads identified. (C) In some case detailed trimming instructions are required in addition to the adapter sequence. The trimming sets CATS_trim01 and CATS_trim02 were trimmed by specifying the correct adapter sequence, but few perfectly trimmed reads were detected. (D) The problem extends to incorrect application of manufacturer’s protocol during read trimming. From left to right, trimming results after following trimming instructions specified in the January 2017, March 2017 and September 2017 releases of the manual. The instructions in the latest version are distinct from those provided in the previous two versions and this is reflected in the number of identified reads, with the latest protocol identifying notably fewer reads associated with the synthetic RNAs. CATS_trim01 was trimmed using the same adapter sequence as CATS_trim05, demonstrating that for some kits, specifying the adapter alone is not sufficient to achieve efficient read trimming.