A key component in many RNA-Seq based studies is the production of multiple replicates for varying experimental conditions. Such replicates allow to capture underlying biological variability and control for experimental ones. However, during data production researchers often lack clear definitions to what constitutes a “bad” replicate which should be discarded and if data from failed replicates is published downstream analysis by groups using this data can be hampered.
University of Pennsylvania researchers have developed a probability model to weigh a given RNA-Seq experiment as a representative of an experimental condition when performing alternative splicing analysis. Using both synthetic and real-life data they demonstrated that this model detects outlier samples which are consistently and significantly different compared to samples from the same condition. Using both synthetic and real-life data the researchers performed extensive evaluation of the algorithm in different scenarios involving perturbed samples, mislabeled samples, no-signal groups, and different levels of coverage, and show it compares favorably with current state of the art tools.
a. LSVs can be single-source (5′ split) or single-target (3′ join). b. The LSV definition is sucient to explain the classical binary splicing event types. c. The LSV definition explains additional complexity observed in metazoan genomes, including non-classical binary events and complex (3 or more junctions) events. d. An example of a complex splicing event at the mouse Eif4g3 locus, augmented with de-novo junction detection by MAJIQ and validated by RT-PCR. Junctions drawn in red arise from the annotation and are supported by RNA-seq, whereas junctions drawn in green were detected from RNA-seq but not the annotation.
Availability – Program and code will be available at: http://majiq.biociphers.org/