Cloud ComputingThe winner of the Big Science Challenge, a contest convened last year by Cycle Computing to provide $10,000 in cloud computing resources for groundbreaking biomedical research, has successfully completed the first phase of its project while logging more than 115 compute years on the Amazon Cloud.

Victor Ruotti and colleagues from the Morgridge Institute for Research at the University of Wisconsin claimed top prize in the challenge.  The intense computing for Ruotti’s experiment – a pariwise comparison of RNA-Seq signatures for 124 stem cell lines — was performed over a week using very high memory instances – each core had 8 Gigabytes (GB) memory. About 1.6 million jobs were scheduled using Condor, although Stowe says other schedulers such as GridEngine could also be used. Spot availability varied over time – up to a maximum of 8,000 cores concurrently, with an average of 5,000 cores running.

The result was 7-8 Terabytes (TB) BAM files.

“The goal of the Big Science Challenge was to help people think bigger than they normally would, to do things that would be impossible on a local cluster,” said Cycle Computing CEO Jason Stowe

(read more at Bio-IT World…)

Incoming search terms:

  • rna seq cloud computing
  • rna-seq data volume
  • cloudmap rna free
  • differential expression pipeline using bowtie with galaxy
  • broad mit cloud sequence analysis
  • RNA-Seq Cloud
  • rna seq alignments cloud
  • galaxy cluster rnaseq
  • free cloud space for running rnaseq
  • deseq cluster condor

Researchers at National Taiwan Ocean University and National Tsing Hua University, Taiwan proposed a workflow which integrated annotations from KEGG biological pathways and Gene Ontology associations for manipulating multiple RNA-seq datasets. The developed system started from mapping short reads onto reference genes, and then performed normalization procedures on read coverage to evaluate and compare expression levels within various gene clusters. Different levels of gene expression were indicated by diverse color shades and graphically shown in designed temporal pathways. Representative GO terms associated with differentially expressed gene cluster were also visually displayed by a GO tag cloud representation. Three different public RNA-Seq datasets were applied to demonstrate that the proposed workflow could provide effective and efficient analysis on differential gene expression for either cross-strain comparison or an identical sample sequenced at different time points. Read more

Incoming search terms:

  • gene ontology rna-seq
  • compare between rna-seq
  • GO analysis rna seq
  • gene expression analysis
  • coverage required for RNA-seq
  • coverage in rna seq
  • go-seq pathways
  • rna seq protocol illumina
  • RNA-seq Coverage Effects on Biological Pathways and GO Tag Clouds
  • workflow rna-seq analysis

cloud computingRNA-Seq is becoming the tool of choice for gene expression studies, as it can facilitate the investigation of phenomena beyond the reach of traditional microarrays, such as novel transcripts and isoforms, alternative splice sites, and allele-specific expression. However, this increased power comes with orders of magnitude higher complexity in terms of bioinformatics, data storage, and processing.

Prognosys Biosciences announced Voila!™, a new cloud-based data analysis service for next-generation sequencing data. Voila! will be available initially for RNA sequencing projects that utilize data from Illumina HiSeq and GAIIx next-generation sequencing instruments.

(Read the press release… )

Golden Helix and Expression Analysis announced they will be developing a cloud-based analytic solution to increase adoption of RNA sequencing. Bioinformatic processes will be performed in a service-based cloud compute environment. This offering will address the obstacles of sequence data by providing cloud-based and integrated desktop analysis tools that are scalable, affordable, and simplified.

(Read the press release… )

Appistry, Inc. announced the release of a series of advanced RNA-Seq solutions for the rapid analysis of sequencing data generated by this emerging technology. The TopHat, TopHat-Fusion and MapSplice-based solutions leverage the Ayrris/BIO(TM) high-performance computing platform to foster Personalized Medicine approaches by enabling researchers to process and analyze large volumes of data in a fraction of the time currently required by conventional gene expression profiling technologies. The RNA-Seq solutions were developed by the Appistry Life Sciences Group–recently established to conceptualize and deliver technologies for Next Generation Sequencing.

(Read the press release… )

Incoming search terms:

  • human bodymap 2 0 data from illumina review
  • cloud rna seq
  • rna-seq data storage
  • RNA sequencer instruments
  • instrument for rna sequencing
  • illumina human bodymap 2 0 software
  • illumina body map normal
  • GAIIX transcriptome seq
  • epicentre biotechnologies acquisition revenue
  • cloud-based rna-seq

7. GCC – Efficient Tool Deployment to the Galaxy Cloud: An RNA-Seq Workflow Case Study

Download the presentation here:  http://www.fml.tuebingen.mpg.de/raetsch/lectures/gcc.pdf

 

Incoming search terms:

  • galaxy cloud workflow
  • galaxy workflow for rna-seq
  • palmapper pipeline
  • rna seq wtss interpreting data

Researchers at Translational Oncology (TRON) at the Medical University of Mainz and Ingenuity® Systems, the provider of IPA® software for RNA-Seq data analysis, have made available an early version of a Galaxy plugin for IPA.  The plugin, still in development, will enable researchers to take RNA-Seq and Re-sequencing processed datasets directly from Galaxy into IPA, for more efficient and impactful biological interpretation of the data.

Galaxy is a free and open source web-based platform for performing integrative genomic analysis. (see related post)

IPA is commercially available software that helps researchers understand biology at multiple levels by integrating data from a variety of experimental platforms. (see related post)

Incoming search terms:

  • RNA seq IPA cancer
  • trinity on galaxy seq

GenePattern – is a powerful genomic analysis platform that provides access to more than 100 tools for gene expression analysis, proteomics, SNP analysis and common data processing tasks.

GenePattern offers a suite of tools to support a wide variety of RNA-seq analyses, including short-read mapping, identification of splice junctions, transcript and isoform detection, quantitation, and differential expression. The modules have been adapted from widely-used tools. GenePattern also provides pipelines that allow you to perform a number of multi-step RNA-seq analyses automatically. Read more

Incoming search terms:

  • tophat fusion genepattern
  • galaxy gene pattern
  • genepatter paper
  • genepattern
  • genepattern de novo rna seq
  • genepattern ngs
  • genepattern rna-seq

The University of California, Santa Cruz (UCSC) Genome Browser is an up-to-date source for genome sequence data from a variety of vertebrate and invertebrate species and major model organisms, integrated with a large collection of aligned annotations. The Browser is a graphical viewer optimized to support fast interactive performance and is an open-source, web-based tool suite built on top of a MySQL database for rapid visualization, examination, and querying of the data at many levels.

The Genome Browser Database, browsing tools, downloadable data files, and documentation can all be found on the UCSC Genome Bioinformatics website.

  • Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D. (2002) The human genome browser at UCSC. Genome Res 12(6), 996-1006. [abstract]

Incoming search terms:

  • ucus genome
  • rna seq reference sequence
  • UCSC reference sequence
  • downloading reference transcriptome ucsc
  • ucsc transcriptome download
  • ucsc transcriptome
  • UCSC genome browser rna-seq
  • transcriptome UCSC browser
  • sequence data analysis ucsc genome browser
  • rna seq ucsc genome browser

The Galaxy Team announced yesterday that the first free public resource for RNA-seq analysis is now available through the Galaxy public server at http://usegalaxy.org .

Galaxy now supports both Tophat and Cufflinks and also provides useful utilities for manipulating and visualizing GTF files, which are common outputs for a Tophat-Cufflinks pipeline.

Here is an exercise for learning about how to use Galaxy for RNA-seq analysis.

Galaxy is an open and free web-based platform for performing accessible, reproducible, and transparent NGS analyses. Users can start using Galaxy by going to http://usegalaxy.org ; alternatively, Galaxy can be downloaded and run on any *NIX machine: http://bitbucket.org/galaxy/galaxy-c…wiki/GetGalaxy or run on cloud computing resources such as Amazon: http://usegalaxy.org/cloud

Incoming search terms:

  • galaxy rna seq
  • galaxy rnaseq
  • galaxy rna-seq
  • tophat cufflinks pipeline
  • galaxy rna
  • rnaseq analysis software
  • bowtie tophat cufflinks pipeline
  • using galaxy rna seq
  • software for rna-seq analysis
  • cufflinks pipeline

From GenomeWeb – By Matthew Dublin

Using a grant from Amazon Web Services and the National Institutes of Health, researchers at the Johns Hopkins Bloomberg School of Public Health have developed an RNA sequencing data analysis program for the cloud called Myrna. The new software calculates differential gene expression in large RNA-seq datasets by using Bowtie, an ultrafast, memory-efficient short read aligner, and R/Bioconductor for statistical calculations. These tools are combined in an automatic, parallel pipeline that runs in the cloud using Elastic MapReduce, on a local Hadoop cluster. Read more

Incoming search terms:

  • rna cloud
  • Myrna rna-seq
  • myrna rnaseq
  • amazon cloud compute cost rna seq
  • sequence analysis cloud computing
  • mirna transcriptome assembly
  • rna seq analysis amazon
  • rnaseq tophat cloud amazon
  • r bioconductor rna-seq アマゾン
  • SEQ Alignment CLOU

  • Social Networking Pages

    Linkedin Group

  • Follow Me on Pinterest
  • RSS SEQanswers – RNA Sequencing

    • Identifying small RNA sequence within whole genome sequence May 21, 2013
      Hi all, I want to know if there are any useful bioinformatic tool to find small RNA sequence within a whole bacteria genome. Thank you in... […]
      Inma
    • standard of clean data May 21, 2013
      Hi all I recently got my prokaryotes RNA-seq data report back. the standard filter steps of the raw data set by our local sequencing center is as... […]
      Pengfei Liu
    • Problem with cummeRbund diffData() May 20, 2013
      Hi all, I'm running Tophat/cufflinks/cuffdiff for differential gene expression and analysis with cummeRbund (v 2.0.0). I'm having an issue with... […]
      Enrique Zudaire
    • How to increase rowsize in heatmap? May 16, 2013
      Hi, I am a complete newbie to all things cummeRbund and am currently fighting with generating readable heatmaps. When I use ... […]
      Mags
    • novoalign mapping May 15, 2013
      Hi, I want to use novoalign to map reads - allowing up to 15 mismatches for 100 bp paired-end reads I am new to novoalign(went through the... […]
      abh
    • Design of expt across multiple lanes May 15, 2013
      Hi, I am performing an RNA-seq experiment to look at differential expression. The design is as follows: 2 populations x 3 biological... […]
      jbono
  • RSS Biostar – RNA-Seq

    • What are the best practices for SNP identification in RNA seq transcriptome data
      I have 20 RICE RNA seq tranascriptome data hiseq 2000 platform paired end reads. I aligned fasta reads with BWA and remove PCR duplicates with PICARD. Later I call SNP with samtools using various parameters. I would like to clarify what parameters should I used while alinging to reference rice genome for looking SNP location 100 bp upstream and 250 bp downst […]
    • How do TopHat options -g , --supress-hits, and Bowtie options interplay?
      Hi, I am currently using TopHat2 to map RNA-seq runs. I think there have been some changes pertaining the -g option. Does anyone know how it works now? I used to think that setting -g would look for n alignments for a given read, report them [if top-scoring] and discard those reads that had more than g [top scoring] alignments. Now, the description sounds mo […]
    • What happened to -k in TopHat for multiple-mapping reads?
      Selecting -g n in tophat does not discard reads mapping more than n, but instead only reports n alignments for those out all all their TOP scoring alignments. I think there used to be an option -k that would allow one to discard reads that topped x alignments -- whatever happened to that? I only see -g in the tophat 2 manual, no reporting options like before […]
    • Does tophat use the library-type information for mapping, or just for the XS flag?
      When I specify library-type to TopHat, i.e., first-strand, second-strand, unstranded, TopHat appends a value + or - to the XS:A tag, which is useful for subsequent analyses, such as annotation. However, does this information influence the "mappability" of reads, or is this unaffected? My guess is that the information will be considered for mapping […]
    • Purpose of Y-shaped adapters in Illumina Sequencing?
      Hi all, Y adapters different sequences to be annealed to the 5' and 3' ends of each molecule in a library. The arms of the Y are unique, and the middle part, connected to the DNA fragment, is complementary. What are the advantages of this? My take of this over having fully-complementary adapters (ADAPTER1 - - - - - ADAPTER1) is that: -Upon primer a […]
    • Cell Type composition in a tissue based on gene marker expression
      I am not sure if the following would even make sense.... Tissues are composed of composite cell types, and often there are studies such as microarray/NGS where we perform a collective sampling of cells from these tissues. Information about the composition (say percentage of cell type) is not taken into consideration. In some case (such as brain/cancer), ther […]