Current bioinformatics methods to detect changes in gene isoform usage in distinct phenotypes compare the relative expected isoform usage in phenotypes. These statistics model differences in isoform usage in normal tissues, which have stable regulation of gene splicing. Pathological conditions, such as cancer, can have broken regulation of splicing that increases the heterogeneity of the expression of splice variants. Inferring events with such differential heterogeneity in gene isoform usage requires new statistical approaches.
Johns Hopkins University researchers introduce Splice Expression Variability Analysis (SEVA) to model increased heterogeneity of splice variant usage between conditions (e.g., tumor and normal samples). SEVA uses a rank-based multivariate statistic that compares the variability of junction expression profiles within one condition to the variability within another. Simulated data show that SEVA is unique in modeling heterogeneity of gene isoform usage, and benchmark SEVA’s performance against EBSeq, DiffSplice, and rMATS that model differential isoform usage instead of heterogeneity. The researchers confirm the accuracy of SEVA in identifying known splice variants in head and neck cancer and perform cross-study validation of novel splice variants. A novel comparison of splice variant heterogeneity between subtypes of head and neck cancer demonstrated unanticipated similarity between the heterogeneity of gene isoform usage in HPV-positive and HPV-negative subtypes and anticipated increased heterogeneity among HPV-negative samples with mutations in genes that regulate the splice variant machinery.
Overview of SEVA
(a) Relative junction expression quantifies the distribution of isoform usage of a gene. For simplicity of this example, we show a gene with three exons. Themodel is shown for two samples of gene isoform usage: one with higher relative expression of an isoform with all three exons (left) and another with higher relative expression of an isoformthat skips the middle exon (right). The relative strength of junction expression in overlapping pairs (e.g.,J1,2withJ1,3orJ2,3withJ1,3) corresponds to the relative proportion ofisoform usage. (b) Example of gene model from (a) in multiple normal (N, left) and tumor (T, right) samples. Note that the normal samples have lower heterogeneity of gene isoform usagethan the tumor samples. (c) To quantify isoform expression, SEVA compares the expression of all pairs of overlapping junctions (see a and d). A dissimilarity measure is obtained fromthe concordance of the comparisons of pairs of overlapping junctions in each pair of samples. This measure is applied to all pairs of samples from the same phenotype (see b) and thenU-statistics theory is applied to these measures to compare the variation of gene isoform usage between the phenotypes. (d) Extension of (a) for a more complex gene splicing model.
Availability – SEVA is implemented in the R/Bioconductor package GSReg. https://bioconductor.org/packages/release/bioc/html/GSReg.html