rDiffDeep transcriptome sequencing (RNA-Seq) has become a vital tool for studying the state of cells in the context of varying environments, genotypes and other factors. RNA-Seq profiling data enable identification of novel isoforms, quantification of known isoforms and detection of changes in transcriptional or RNA-processing activity. Existing approaches to detect differential isoform abundance between samples either require a complete isoform annotation or fall short in providing statistically robust and calibrated significance estimates.

Now, a team led by researchers from Computational Biology Center, Sloan-Kettering Institute and the Friedrich Miescher Laboratory of the Max-Planck Society have developed a suite of statistical tests to address these open needs that includes:

  • A statistical test called rDiff.parametric that based on the gene annotation, detects differential splicing. This test models the counts in a similar way as DESeq and can account for biological variance.
  • A statistical test called rDiff.nonparametric that tests for a difference in the read distribution at a genomic locus. Therefore, it only needs the gene start and stop to detect detects differential relative transcript abundance and can be used when the splicing events are not yet annotated. This test is based on the Maximum mean discrepancy test and can also account for biological variance.

The team demonstrates:

1) the statistical calibration of the proposed is better than the one of CuffDiff (although not perfect) .

2) rDiff.parametric outperforms MISO and CuffDiff on a realistic simulated dataset.

3) rDiff.nonparametric has a similar performance as the Miso and CuffDiff without using the gene annotation for testing.

4) the methods perform well on two real datasets (from A.thaliana and D.melanogaster).

Availability – The proposed toolkit is available from http://bioweb.me/rdiff and enables in-depth analyses of transcriptomes, with or without available isoform annotation.

  • Drewe P, Stegle O, Hartmann L, Kahles A, Bohnert R, Wachter A, Borgwardt K, Rätsch G. (2013) Accurate detection of differential RNA processing. Nucleic Acids Res [Epub ahead of print]. [article]

Incoming search terms:

  • miller nephew rna-seq
  • DEFUSE study protocol
  • differential rna processing
  • Multiple mapping 和Junction mapping
  • rdiff rna
  • seecer probailistic for rna sequencing error correction

from figshare – by Eduardo Eyras, Gael P. Alamancos & Eneritz Agirre

Methods to Study Splicing from RNA-Seq

Graphical representation of methods to study splicing from RNA-Seq. Methods are divided according to whether they perform Mapping, Reconstruction of events or isoforms, Quantification of events and/or isoforms and whether they can perform a Comparison between two or more conditions of event or isoform relative abundances, or of isoform expression. We only list the Mapping methods that are spliced-mappers or the ones that use some heuristics to map to known exon and junctions. Methods for Reconstruction (blue), Quantification (green) and Comparison (red) are divided according to whether they work with isoforms (lighter color) or with events (darker color). Methods that work at both levels, are overlapped by the two color tones. Some methods perform reconstruction and quantification and are grouped with those that only perform reconstruction. Some mapping methods also perform quantification and are repeated in two levels. Methods that require an annotation are indicated. Quantification methods that work with or without annotation are in different groups. Solid arrows connect Mapping methods to the tools in the other three levels. We indicated with dashed gray arrows those cases when a Comparison method can use the output from a Quantification method.

http://regulatorygenomics.upf.edu/Software/RNA-Seq_and_splicing/

Incoming search terms:

  • FPKM cufflinks RPKM
  • methods to study splicing from rna-seq
  • percent spliced in
  • rna seq splice variants
  • alternative splicing and differential use of splice junction ppt
  • alternative splicing powerpoint
  • methods to study splicing from high
  • Methods to study splicing from high-throughput RNA Sequencing data
  • splicing source maps

Many RNA sequencing studies set out to predict mutations, splice junctions or fusion RNAs. Now a team of researchers in France has developed a method, CRAC, that integrates genomic locations and local coverage to enable such predictions to be made directly from RNA-Seq read analysis. A k-mer profiling approach detects candidate mutations, indels and splice or chimeric junctions in each single read. CRAC increases precision compared with existing tools, reaching 99:5% for splice junctions, without losing sensitivity. Importantly, CRAC predictions improve with read length. In cancer libraries, CRAC recovered 74% of validated fusion RNAs and predicted novel recurrent chimeric junctions.

Availability – CRAC is available at http://crac.gforge.inria.fr.

  • Philippe N, Salson M, Commes T, Rivals E. (2013) CRAC: an integrated approach to the analysis of RNA-seq reads. Genome Biology 14, R30. [abstract]

Incoming search terms:

  • splice variant identification software
  • arabidopsis dexseq
  • mrna seq paired end number of reads for splicing detection
  • rna-seq clip analysis
  • RNA-seq multi mapping
  • RNA-sequencing ion torrent method
  • splice junctions rna-seq
  • splice variant prediction software
  • tophat alternative splicing statistic
  • integrate rna-seq splcing

Novel technologies brought in unprecedented amounts of high-throughput sequencing data along with great challenges in their analysis and interpretation. The percent-spliced-in (PSI, Ψ) metric estimates the incidence of single-exon skipping events and can be computed directly by counting reads that align to known or predicted splice junctions. However, the vast majority of human splicing events are more complex than single-exon skipping.

A team led by scientists at the Centre de Regulació Genòmica, Spain has now developed a framework that generalizes the Ψ metric to arbitrary classes of splicing events. They change the view from exon-centric to intron-centric and split the value of Ψ into two indices, ψ(5) and ψ(3), measuring the rate of splicing at the 5′- and 3′-end of the intron, respectively. The advantage of having two separate indices is that they deconvolute two distinct elementary acts of the splicing reaction. The completeness of splicing index (COSI) is decomposed in a similar way. This framework is implemented as bam2ssj, a BAM-file processing pipeline for strand-specific counting of reads that align to splice junctions or overlap with splice sites. It can be used as a consistent protocol for quantifying splice junctions from RNA-seq data since no such standard procedure currently exists.

AVAILABILITY: The C(++) code of bam2ssj is open-source and is available at https://github.com/pervouchine/bam2ssj CONTACT: dp@crg.eu.

Pervouchine DD, Knowles DG, Guigó R. (2012) Intron-Centric Estimation of Alternative Splicing from RNA-seq data. Bioinformatics [Epub ahead of print]. [article]

Incoming search terms:

  • ASprofile
  • Opportunities and Methods for Studying Alternative Splicing in Cancer with RNA-Seq
  • junction reads
  • finding alternative spliced transcripts from rnaseq data
  • olego: fast and sensitive mapping of spliced mrna-seq reads using small seeds
  • rna-seq alternative splicing biological replicates
  • rna-seq alternative splicing database

Paired-end whole transcriptome sequencing provides evidence for fusion transcripts. However, due to the repetitiveness of the transcriptome, many reads have multiple high-quality mappings. Previous methods to find gene fusions either ignored these reads or required additional longer single reads. This can obscure up to 30% of fusions and unnecessarily discards much of the data. We present a method for using paired-end reads to find fusion transcripts without requiring unique mappings or additional single read sequencing. Using simulated data and data from tumors and cell lines, we show that our method can find fusions with ambiguously mapping read pairs without generating numerous spurious fusions from the many mapping locations.

Availability: A C++ and Python implementation of the method demonstrated in this paper is available at http://exon.ucsd.edu/ShortFuse

Kinsella M, Harismendy O, Nakano M, Frazer KA, Bafna V. (2012) Sensitive gene fusion detection using ambiguously mapping RNA-Seq read pairs. Bioinformatics 27(8), 1068-75. [abstract]

Incoming search terms:

  • how to determine the proportions of alternatively differentiallly spliced transcripts
  • scoring rna-seq splice site classifier
  • shortfuse fusion
  • statistical assessment of gene fusion detection algorithms using rna seq data
  • tx-fuse gene fusion

JETTA detects alternatively spliced exons based on pre-calculated statistics of exons and junctions in RNA-Seq data sets. It does not support low-level RNA-Seq analysis such as base calling, mapping, alignment, transcript assembly, and expression calculation. For low-level RNA-Seq analysis, please refer to SeqMap, rSeq, and SpliceMap or other tools.

Input Data Files – JETTA requires the following files

These expression matrixes can be calculated from any RNA-Seq platform, but Illumina RNA-Seq plaforms with GA-II, GA-IIx and HiSeq2000 are suggested. Each expression files should contain samples of two conditions, and at least more than two samples for each condition.

Availability – JETTA source codes and other downloads are available at: http://gluegrant1.stanford.edu/~junhee/JETTA/

splicegrapher

SpliceGrapher is a package for creating splice graphs from RNA-Seq data, guided by gene models and EST data (when available).

Features

  • Use your favorite read mapping and spliced alignment tools.
  • Accurate spliced-alignment filtering using SVM classifiers that recognize splice junction sequence features.
  • Generate statistics of alternative splicing.
  • Visualization of splice graphs, splice junctions, and read depth
  • Use our pipeline or construct your own custom pipeline out of SpliceGrapher modules.

Results obtained using RNA-Seq experiments in Arabidopsis thaliana show that predictions made by SpliceGrapher method are more consistent with current gene models than predictions made by TAU and Cufflinks. Furthermore, analysis of plant and human data indicates that the machine learning approach used by SpliceGrapher is useful for discriminating between real and spurious splice sites, and can improve the reliability of detection of alternative splicing.

SpliceGrapher is available for download at http://SpliceGrapher.sf.net.

  • Rogers MF, Thomas J, Reddy AS, Ben-Hur A. (2012) SpliceGrapher: Detecting patterns of alternative splicing from RNA-seq data in the context of gene models and EST data. Genome Biol 13(1), R4. [abstract]

Incoming search terms:

  • rna seq alternative splicing
  • splicegrapher
  • alternative splicing mrna alignment
  • splicegrapher wikipedia
  • rna seq alternative splicing events
  • read depth rna-seq tophat
  • plant RNA Seq data analysis
  • splicegrapher bam
  • est rna seq
  • support vector machine rnaseq alternative splicing

SpliceTrap is a method to quantify exon inclusion levels using paired-end RNA-seq data. Unlike other tools, which focus on full-length transcript isoforms, SpliceTrap approaches the expression-level estimation of each exon as an independent Bayesian inference problem. In addition, SpliceTrap can identify major classes of alternative splicing events under a single cellular condition, without requiring a background set of reads to estimate relative splicing changes.

Availability SpliceTrap is available for download and installation at: http://rulai.cshl.edu/splicetrap/

  • Wu J, Akerman M, Sun S, McCombie WR, Krainer AR, Zhang MQ. (2011) SpliceTrap: a method to quantify alternative splicing under single cellular conditions. Bioinformatics [Epub ahead of print]. [abstract]

Incoming search terms:

  • splice trap vs top hat
  • alternative splicing methods
  • splicetrap
  • SpliceTrap tool
  • splicetrap tutorial
  • the methods of alternative splicing

  • Social Networking Pages

    Linkedin Group

  • Follow Me on Pinterest
  • RSS SEQanswers – RNA Sequencing

    • RNAseq (SOLiD) from 18 - 200 nt June 18, 2013
      We are interested in small non-coding RNAs. Whomever you ask about the size range of small RNAs, you get a different answer. ;) Lets assume, small... […]
      GenomicIBK
    • Unmapped ratio very high on mouse genome June 17, 2013
      Hi, My problem regards RNA-Seq data. I've downloaded public data (SAGE libs w/ 6 different samples from mouse liver ) to analyse using ArrayStudio.... […]
      le.nono
    • RNASeq: Read length different from expected June 17, 2013
      Hello all, I have received paired-end reads for 40 samples. The reads are supposed to be 100bp per end. Instead, 20 of my samples are 101bp per... […]
      gogodidi
    • How to install xgawk June 16, 2013
      Hi, This is Shrujan, i have a problem while running RNA Sequencing QC. It shows an error that xgawk is not found. So please help me installing... […]
      shrujan
    • RNA Sequencing QC Error while using with Sequence_QC.sh file June 15, 2013
      Hi, This is Shrujan kumar Madadha, I had an error while running QC for Drosophila Yukuba fastq RNA file using Sequence_QC.sh file of FASTX... […]
      shrujan
    • Cuffmerge related query June 12, 2013
      I have a query regarding what samples should be merged using cuffmerge, when you have multiple phenotypes (each with replicates). Lets say my mouse... […]
      ParthavJailwala
  • RSS Biostar – RNA-Seq

    • edgeR: very low p-value and very high variance within the group of replicates. What's my problem??
      I'm using edgeR in order to perform differential expression analysis from RNA-seq experiment. I have 6 samples of tumor cell, same tumor and same treatment: 3 patient with good prognosis and 3 patient with bad prognosis. I want to compare the gene expression among the two groups. I ran the edgeR pakage like follow: x […]
    • Normalising tag count to RPKM
      Hi! I was wondering if their is a way to normalise the number of reads in a region and the RPKM of the nearest gene to that region, so that a correlation could be computed. Like the following data shows number of tags in first column and RPKM in second column Tags RPKM 15 0.14619 11 0 203 0.2259 129 10.701 300 7.0772 122 2.3234 346 10.666 77 3.117 201 16.749 […]
    • a simple question on RNA-Seq terminology
      This question may be very simple and basic, but I just need to confirm that I understand the differences among those terminologies in the RNA-Seq context. Suppose I have a sample called SLR, and it is sequenced on 5 lanes, so I have (among other output files) BAM files like L1_SLR, L2_SLR, L3_SLR, L5_SLR and L7_SLR.bam. Here, the letter "L" denotes […]
    • FInding regions of interest with minimum coverage
      Hi, I have a bam file of all my accepted hits (tophat output) and an gtf file with my genes of interest for which I am trying to find potential antisense transcripts. I would like to create a list - preferably one that can be visualized in a genome browser - that shows all genes that have antisense reads in the accepted hits.bam file provided that there are […]
    • How to remove the intronic reads before counting
      I got RNASeq data in several samples. I checked the FastQC, seems the read quality are good (Hiseq 2000). But the problem is many reads are mapped to intronic region, and the regions have no any reference exons there (Refseq, ensembl, gencode). We don't know what they are. We guess the problem happend in library preparation, the concentration was low. N […]
    • Which strand of the mRNA molecule does the sequencer output as a "read"?
      In Illumina Stranded RNA-Seq (using the dUTP method), do the final reads in the fastq files correspond to the initial molecule (that was transcribed), or to the reverse complement of the molecule? C […]