Outlier detection for improved differential splicing quantification from RNA-Seq experiments with replicates

A key component in many RNA-Seq based studies is contrasting multiple replicates from different experimental conditions. In this setup, replicates play a key role as they allow to capture underlying biological variability inherent to the compared conditions, as well as experimental variability. However, what constitutes a “bad” replicate is not necessarily well defined. Consequently, researchers might discard valuable data or downstream analysis may be hampered by failed experiments.

Here, University of Pennsylvania researchers develop a probability model to weigh a given RNA-Seq sample as a representative of an experimental condition when performing alternative splicing analysis. They demonstrate that this model detects outlier samples which are consistently and significantly different compared to other samples from the same condition. Moreover, they show that instead of discarding such samples the proposed weighting scheme can be used to downweight samples and specific splicing variations suspected as outliers, gaining statistical power. These weights can then be used for differential splicing (DS) analysis, where the resulting algorithm offers a generalization of the MAJIQ algorithm. Using both synthetic and real-life data the researchers perform an extensive evaluation of the improved MAJIQ algorithm in different scenarios involving perturbed samples, mislabeled samples, same condition groups, and different levels of coverage, showing it compares favorably to other tools. Overall, this work offers an outlier detection algorithm that can be combined with any splicing pipeline, a generalized and improved version of MAJIQ for differential splicing detection, and evaluation metrics with matching code and data for DS algorithms.

Illustration of local splicing variations (LSV)

rna-seq

(a) LSVs are splits in a gene splice graph where multiple edges spawn out from (single- source, 5’ split) or merge into (single-target, 3’ join) a reference exon. (b) The LSV definition captures classical binary splicing event types. (c) The LSV definition can capture complex variations that involve more than two junctions. (d) A sample from MAJIQ’s output for a single-target LSV involving exon 8 of Clta gene in mouse cerebellum. MAJIQ quantifies the LSV shown on the left by the marginal inclusion level (ψj∈[0,1]) of eachof the possible colored junctionj, estimating a posterior distribution overthose (right). Similarly, when comparing two conditions, a distribution over∆ψj∈[−1,1] is derived (not shown).

Availability: Software and data are accessible via majiq.biociphers.org/norton_et_al_2017/.

Norton S, Vaquero-Garcia J, Lahens NF, Grant GR, Barash Y. (2017) Outlier detection for improved differential splicing quantification from RNA-Seq experiments with replicates. Bioinformatics [Epub ahead of print]. [abstract]

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Time limit is exhausted. Please reload CAPTCHA.