7. GCC – Efficient Tool Deployment to the Galaxy Cloud: An RNA-Seq Workflow Case Study

Download the presentation here:  http://www.fml.tuebingen.mpg.de/raetsch/lectures/gcc.pdf

 

Incoming search terms:

  • galaxy cloud workflow
  • galaxy workflow for rna-seq
  • palmapper pipeline
  • rna seq wtss interpreting data

Genome and transcriptome sequencing experience a challenging renewal with the advent of Next Generation Sequencing (NGS) technologies. Notably, short mRNA sequences produced by RNA-Seq enhance transcriptome analysis and promise great opportunities for the discovery of new genes and the identification of alternative transcripts. One way to analyze this data is aligning the reads against a reference genome. However, the sheer amount of NGS data requires highly efficient methods for accurate spliced alignments, which is further challenged by the size and quality of the sequence reads. Read more

Incoming search terms:

  • alignment regarding splice
  • best assembler 454 transcriptome mira

Technical Guides

Discussion Forums

  • The RNA-Seq Blog – A discussion forum for all things transcriptomic.
  • SEQanswers – The next-generation sequencing community – threads tagged with RNA-Seq.

Webinars

  • An Illumina-Demonstrated Method for Sequencing the Complete Transcriptome -  Session will introduce an improved solution for the reduction of abundant transcripts in RNA-Seq experiments, based on an Illumina-optimized protocol utilizing duplex-specific nuclease (DSN) from Evrogen. Illumina scientists will provide a brief overview of DSN, will describe the enhancements made to the DSN workflow to optimize its performance for Illumina RNA-Seq, and will demonstrate its utility in a wide range of applications, including ncRNA discovery and FFPE transcriptome profiling.

RNA-Seq Data Analysis Tools

  • rQuant.web – is a web service to provide convenient access to tools for the quantitative analysis of RNA-Seq data. It allows to determine abundances of multiple transcripts per gene locus from RNA-Seq measurements. rQuant.web is available free of charge, to all users as a tool in a Galaxy installation. 
  • Scripture – is a method for transcriptome reconstruction that relies solely on RNA-Seq reads and an assembled genome to build a transcriptome ab initio.
  • Cufflinks – assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of transcripts. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one.
  • SpliceMap – SpliceMap is a de novo splice junction discovery tool. It offers high sensitivity and support for arbitrarily long RNA-seq read lengths.
  • TopHat – is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.
  • PALMapper – a combination of the spliced alignment method QPALMA with the short read alignment tool GenomeMapper. The resulting method, called PALMapper, efficiently computes both spliced and unspliced alignments at high accuracy while taking advantage of base quality information and splice site predictions.
  • RNA-MATE – A recursive mapping strategy for high-throughput RNA-sequencing data.
  • ERANGE – Mapping and Quantifying Mammalian Transcriptomes by RNA-Seq
  • SeqMap – A Tool For Mapping Millions Of Short Sequences To The Genome.
  • Bioconductor – Bioconductor is an open source and open development software project for the analysis and comprehension of genomic data.
  • BWA – BWA is a fast light-weighted tool that aligns relatively short sequences (queries) to a sequence database (targe), such as the human reference genome.
  • CisGenome – An integrated tool for tiling array, ChIP-seq, genome and cis-regulatory element analysis.
  • GenePattern – is a powerful genomic analysis platform that provides access to more than 100 tools for gene expression analysis, proteomics, SNP analysis and common data processing tasks. A web-based interface provides easy access to these tools and allows the creation of multi-step analysis pipelines that enable reproducible in silico research.
  • Galaxy – Mapping pipeline for Illumina, 454, and SOLiD sequencing data.
  • MAQ – stands for Mapping and Assembly with Quality It builds assembly by mapping short reads to reference sequences.
  • UCSC Genome Browser – This site contains the reference sequence and working draft assemblies for a large collection of genomes. It also provides portals to the ENCODE and Neandertal projects.

Incoming search terms:

  • seq web
  • rquant
  • chip-seq fastqc to cisgenome
  • s eq uen
  • RNA-seq websites
  • rna seq questions
  • rna seq question
  • resource for learning rna seq
  • mtdna rna-seq seq answers
  • cisgenome protocol
  • Scripture – is a method for transcriptome reconstruction that relies solely on RNA-Seq reads and an assembled genome to build a transcriptome ab initio.
  • Cufflinks – assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of transcripts. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one.
  • SpliceMap – SpliceMap is a de novo splice junction discovery tool. It offers high sensitivity and support for arbitrarily long RNA-seq read lengths.
  • TopHat – is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.
  • PALMapper – a combination of the spliced alignment method QPALMA with the short read alignment tool GenomeMapper. The resulting method, called PALMapper, efficiently computes both spliced and unspliced alignments at high accuracy while taking advantage of base quality information and splice site predictions.
  • RNA-MATE – A recursive mapping strategy for high-throughput RNA-sequencing data.
  • ERANGE – Mapping and Quantifying Mammalian Transcriptomes by RNA-Seq
  • SeqMap – A Tool For Mapping Millions Of Short Sequences To The Genome.
  • Bioconductor – Bioconductor is an open source and open development software project for the analysis and comprehension of genomic data.
  • BWA – BWA is a fast light-weighted tool that aligns relatively short sequences (queries) to a sequence database (targe), such as the human reference genome.
  • CisGenome – An integrated tool for tiling array, ChIP-seq, genome and cis-regulatory element analysis.
  • GenePattern – is a powerful genomic analysis platform that provides access to more than 100 tools for gene expression analysis, proteomics, SNP analysis and common data processing tasks. A web-based interface provides easy access to these tools and allows the creation of multi-step analysis pipelines that enable reproducible in silico research.
  • Galaxy – Mapping pipeline for Illumina, 454, and SOLiD sequencing data.
  • MAQ – stands for Mapping and Assembly with Quality It builds assembly by mapping short reads to reference sequences.
  • UCSC Genome Browser – This site contains the reference sequence and working draft assemblies for a large collection of genomes. It also provides portals to the ENCODE and Neandertal projects.

Incoming search terms:

  • rna seq analysis
  • rna-seq analysis
  • rna seq data analysis
  • rnaseq analysis
  • rna-seq pipeline
  • rna seq analysis software
  • RNA-seq data analysis
  • rna seq data analysis pipeline
  • rna seq data
  • rnaseq pipeline

  • Social Networking Pages

    Linkedin Group

  • Follow Me on Pinterest
  • RSS SEQanswers – RNA Sequencing

    • DESeq; can I omit timepoints during dispersal estimation? May 24, 2013
      I have a bacterial timecourse with 2 biological replicates per timepoint. There is a fair bit of variance between my replicates. I have spent the... […]
      amcloon
    • HT Seq Count stranded options May 24, 2013
      I am very new to bioinformatics, so I would be really grateful for some help! I have been using *HTSeq Count v0.5.3* and I am bit confused about... […]
      qwrissie
    • Tophat 2.0.8b installation error May 24, 2013
      I install tophat-2.0.8b to rerun the mapping. but when i make it, the error appears like this. make[1]: Entering directory... […]
      canhu
    • reason for low mapping rate?? May 23, 2013
      we did RNASeq using HiSeq 2000 100PE. When the data were back, I mapping them to the reference sequence, but got very low mapping rate (30-40%). I... […]
      miaom
    • cross-species data - questions about normalization May 23, 2013
      Hi, I have some data form various samples (cell types) in different species. I want to compare and analyze gene expression variability across the... […]
      trelek2
    • CuffDiff strange output May 23, 2013
      Hi, I hope that someone can be so gentle to help me. I'm analizing some data from RNA-Seq with TopHat and Cufflinks and I focus my attention on... […]
      Pruexel
  • RSS Biostar – RNA-Seq

    • Why am I getting so many unmapped reads in STAR, classified as "too short"?
      I am currently using STAR to map several Hi-SEQ mRNA runs. I'm having trouble getting a decent amount of reads to map, but I don't really understand why. I'm hoping you can shed some light :) In the final log, only about 50% (or less) of the reads map to the reference. I'm using a GTF in addition to the genome. The unmapped bin that most […]
    • What are the best practices for SNP identification in RNA seq transcriptome data
      I have 20 RICE RNA seq tranascriptome data hiseq 2000 platform paired end reads. I aligned fasta reads with BWA and remove PCR duplicates with PICARD. Later I call SNP with samtools using various parameters. I would like to clarify what parameters should I used while alinging to reference rice genome for looking SNP location 100 bp upstream and 250 bp downst […]
    • How do TopHat options -g , --supress-hits, and Bowtie options interplay?
      Hi, I am currently using TopHat2 to map RNA-seq runs. I think there have been some changes pertaining the -g option. Does anyone know how it works now? I used to think that setting -g would look for n alignments for a given read, report them [if top-scoring] and discard those reads that had more than g [top scoring] alignments. Now, the description sounds mo […]
    • What happened to -k in TopHat for multiple-mapping reads?
      Selecting -g n in tophat does not discard reads mapping more than n, but instead only reports n alignments for those out all all their TOP scoring alignments. I think there used to be an option -k that would allow one to discard reads that topped x alignments -- whatever happened to that? I only see -g in the tophat 2 manual, no reporting options like before […]
    • Does tophat use the library-type information for mapping, or just for the XS flag?
      When I specify library-type to TopHat, i.e., first-strand, second-strand, unstranded, TopHat appends a value + or - to the XS:A flag, which is useful for subsequent analyses, such as annotation. However, does this information actually influence the "mappability" of reads, or is this unaffected? My thinking is that the information would be considere […]
    • Purpose of Y-shaped adapters in Illumina Sequencing?
      Hi all, Y adapters different sequences to be annealed to the 5' and 3' ends of each molecule in a library. The arms of the Y are unique, and the middle part, connected to the DNA fragment, is complementary. What are the advantages of this? My take of this over having fully-complementary adapters (ADAPTER1 - - - - - ADAPTER1) is that: -Upon primer a […]