University of Montpellier researchers have developed a k-mer-based computational protocol, DE-kupl, for capturing local RNA variation in a set of RNA-seq libraries, independently of a reference genome or transcriptome. DE-kupl extracts all k-mers with differential abundance directly from the raw data files. This enables the retrieval of virtually all variation present in an RNA-seq data set. This variation is subsequently assigned to biological events or entities such as differential long non-coding RNAs, splice and polyadenylation variants, introns, repeats, editing or mutation events, and exogenous RNA. Applying DE-kupl to human RNA-seq data sets identified multiple types of novel events, reproducibly across independent RNA-seq experiments.
The DE-kupl pipeline for the discovery and analysis of differentially expressed k-mers
First, Jellyfish is applied to count k-mers in all libraries. k-mers counts are then joined into a count matrix and filtered for low recurrence and matching to the reference transcriptome. Normalization factors are computed from raw k-mer counts and the differential expression procedure is applied. Finally, overlapping differentially expressed k-mers are extended into contigs and annotated based on their alignment to the reference and overlap with annotated genes