Methods to read out naturally occurring or experimentally introduced nucleic acid modifications are emerging as powerful tools to study dynamic cellular processes. The recovery, quantification and interpretation of such events in high-throughput sequencing datasets demands specialized bioinformatics approaches.
Researchers at the Vienna BioCenter present Digital Unmasking of Nucleotide conversions in K-mers (DUNK), a data analysis pipeline enabling the quantification of nucleotide conversions in high-throughput sequencing datasets. They demonstrate using experimentally generated and simulated datasets that DUNK allows constant mapping rates irrespective of nucleotide-conversion rates, promotes the recovery of multimapping reads and employs Single Nucleotide Polymorphism (SNP) masking to uncouple true SNPs from nucleotide conversions to facilitate a robust and sensitive quantification of nucleotide-conversions. As a first application, the researchers implement this strategy as SLAM-DUNK for the analysis of SLAMseq profiles, in which 4-thiouridine-labeled transcripts are detected based on T > C conversions. SLAM-DUNK provides both raw counts of nucleotide-conversion containing reads as well as a base-content and read coverage normalized approach for estimating the fractions of labeled transcripts as readout.
Digital Unmasking of Nucleotide-conversions in k-mers
Legend: Possible base outcomes for a given nucleotide-conversion: match with reference (white), nucleotide-conversion scored as mismatch (red), nucleotide-conversion scored with nucleotide-conversion aware scoring (blue), low-quality nucleotide conversion (black) and filtered nucleotide-conversion (opaque) a Naïve nucleotide-conversion processing and quantification vs DUNK: The naïve read mapper (left) maps 11 reads (grey) to the reference genome and discards five reads (light grey), that comprise many converted nucleotides (red). The DUNK mapper (right) maps all 16 reads. b DUNK processes multi-mapping reads (R5, R6, R7, left) such that the ones (R3, R6) that can be unambiguously assigned to a 3′ interval are identified and assigned to that region, R5 and R7 cannot be assigned to a 3′ interval and will be deleted from downstream analyses. R2 is discarded due to general low alignment quality. c False-positive nucleotide conversions originating from Single-Nucleotide Polymorphisms are masked. d High-quality nucleotide-conversions are quantified normalizing for coverage and base content
Beyond providing a readily accessible tool for analyzing SLAMseq and related time-resolved RNA sequencing methods (TimeLapse-seq, TUC-seq), DUNK establishes a broadly applicable strategy for quantifying nucleotide conversions.
Availability – SLAM-DUNK is available from Bioconda, as Python package from PyPI, as Docker image from Docker hub and also from source (http://t-neumann.github.io/slamdunk) under the GNU AGPL license.