Alternative pre-mRNA splicing (AS) greatly diversifies metazoan transcriptomes and proteomes and is crucial for gene regulation. Current computational analysis methods of AS from Illumina RNA-sequencing data rely on preannotated libraries of known spliced transcripts, which hinders AS analysis with poorly annotated genomes and can further mask unknown AS patterns.
To address this critical bioinformatics problem, researchers from the developed a method called the junction usage model (JUM) that uses a bottom-up approach to identify, analyze, and quantitate global AS profiles without any prior transcriptome annotations. JUM accurately reports global AS changes in terms of the five conventional AS patterns and an additional “composite” category composed of inseparable combinations of conventional patterns. JUM stringently classifies the difficult and disease-relevant pattern of intron retention (IR), reducing the false positive rate of IR detection commonly seen in other annotation-based methods to near-negligible rates. When analyzing AS in RNA samples derived from Drosophila heads, human tumors, and human cell lines bearing cancer-associated splicing factor mutations, JUM consistently identified approximately twice the number of novel AS events missed by other methods. Computational simulations showed JUM exhibits a 1.2 to 4.8 times higher true positive rate at a fixed cutoff of 5% false discovery rate. In summary, JUM provides a framework and improved method that removes the necessity for transcriptome annotations and enables the detection, analysis, and quantification of AS patterns in complex metazoan transcriptomes with superior accuracy.
UM exclusively uses sequence reads mapped to splice junctions and defines AS structures as the basic quantitation unit for differential AS analysis
(A) JUM uses RNA-seq reads mapped to splice junctions for AS quantification. Green rectangles indicate exons and lines introns. Green and blue short lines represent reads that mapped to splice junctions connecting exons, which are the most direct evidence for the existence and quantitative assessment of a given splice junction. JUM defines the start coordinate of a splice junction as the 5’ initiation site (5’IS) and the end coordinate of a splice junction as the 3’ ending site (3’ES). An “AS structure” is defined as a set of junctions that share the same 5’IS or the same 3’ES. Each splice junction in an AS structure is defined as a sub-AS-junction. (B-E) AS structures are the basic element that comprise all conventionally recognized AS patterns.(F) JUM models the sequence reads that map to a sub-AS-junction as negative binomial distribution to quantify the “usage” of each sub-AS-junction in an AS structure under one biological condition. (G) JUM fits two generalized linear models to evaluate the influence of a given biological condition on the usage of a specific sub-AS-junction in an AS structure.
Availability – A user-friendly version of the JUM package has been deposited on GitHub: https://github.com/qqwang-berkeley/JUM. The codes are written in perl and bash shell scripts.