Statistical analysis of RNA-seq data from next-generation sequencing experiments of cancer.

Institution: Queen’s University Belfast

Dept/School/Faculty: School of Medicine, Dentistry & Biomedical Sciences

PhD Supervisor: Dr F Emmert-Streib

Co-Supervisor: Prof M Salto-Tellez

Application Deadline: 25 January 2013

Genomics data generated by next-generation sequencing (NGS) technologies represent the most recent type of high-throughput data. However, due the novelty of NGS data, our ability to analyse either DNA-seq or RNA-seq data is severely limited. This hampers the biological usage of such data, because without statistically sound methods that allow a robust extraction of information from these large data sets, their biological merit is very limited. Read more

Incoming search terms:

  • statistical analysis of rna-seq data from next-generation sequencing experiments
  • Bioinformatics Ngs data analysis position 2013
  • Bioinformatics phd position 2013 ngs data analysis
  • Bioinformatics phd position ngs data analysis :2013
  • how to separate protein coding genes in rna seq data
  • next generation sequencing powerpoint presentazioni
  • rna phd projects
  • rna seq patient samples
  • rnaseq review strand specific nature review
  • phd position rna seq analysis

Next generation sequencing (NGS) has been the driving force in molecular biology and biomedicine. After completion of the human genome project, this technique arose and is now used to tackle the questions remaining after identification of the human genetic information. We have to answer the terms of regulation and transcriptional importance of specific genes. The method, where sequencing and gene expression come together is RNA-Seq. This technique has as all methods benefits and shortcomings, but in combination with quantitative PCR (qPCR) and gene specific sequencing it is a powerful tool that can and will alter our procedure of analyzing patient samples. Possible pitfalls and solutions are provided to tackle problems not only for RNA-Seq but also for CNV (copy number variation) analysis, which mostly employ qPCR. Since oncology is a major field of interest and represents a good example as research area due to a multitude of starting material, it has been chosen to be the focus of this methodological overview.

RNA-Seq

(read more…)

  • Loewe RP. (2012) Combinational Usage of Next Generation Sequencing and qPCR for the analysis of tumour samples. Methods [Epub ahead of print]. [abstract]

Incoming search terms:

  • difference between qpcr and next gen
  • qpcr rna-seq few genes

by Jeffrey M. Perkel at The Scientist

September was a monumental month for genome aficionados. The National Human Genome Research Institute (NHGRI)–funded Encyclopedia of DNA Elements (ENCODE) Project released 30 papers in the pages of Nature, Genome Biology, Genome Research, plus another nine in Science, Cell, and the Journal of Biological Chemistry detailing functional features across the human genome. In all, ENCODE researchers performed nearly 1,650 experiments on 147 cell lines assessing transcription, transcription factor binding, chromatin topology, histone modifications, DNA methylation, and more.

Epigenome

The term that encompasses such myriad functional elements is epigenomics, and researchers are now well aware of the importance of such features in development and disease. So much so, in fact, that in 2008, five years after NHGRI launched ENCODE, the NIH funded a second large-scale mapping project. The NIH Roadmap Epigenomics Program had compiled some 61 “complete” epigenomes (genome-wide epigenetic profiles of a variety of cell types) as of May 2012, with more scheduled for inclusion in the project’s upcoming release number 8 of the Human Epigenome Atlas.

There’s a lot researchers can do with these data sets. In an early demonstration, The University of Washington’s John Stamatoyannopoulos, a member of both the ENCODE and Roadmap consortia, and colleagues mined these data to address the puzzling fact that the vast majority of trait- and disease-associated sequence variants (SNPs) identified in genome-wide scans lie outside of any protein-coding sequence. By correlating those variant positions against accessible chromatin regions identified in the two epigenomics projects, Stamatoyannopoulos and his team found these variants often overlap with regulatory elements. They then identified the genes upon which those regulatory elements might act—some located hundreds of thousands of bases away (Science, 337:1190-95, 2012).

Both projects have made their data freely available to the research community, many of whom may want to see what these data sets have to say about their own particular gene, tissue, or pathway of interest. Yet for many researchers, handling, parsing, and visualizing so much information can be intimidating. The ENCODE data set alone weighs in at 15 terabytes.

The best advice, says John Satterlee, a Health Scientist Administrator at the National Institute on Drug Abuse and a co-coordinator of the NIH Roadmap Epigenomics Program, is just to jump in and see what’s there. “It’s not like you’re wasting reagents—this is just an in silico experiment,” he says.

We asked Satterlee and fellow experts to show us how to make use of these visualization tools. Here is what they said…

(read more…)

Incoming search terms:

  • epigenome torrent
  • Experimentally Mapping the Epigenome
  • genomic/geneitc technology and methods flow chart
  • rna seq epigenome
  • rna-seq disease mexico
  • rna-swq epigenome

Job ID

107527BR

Posting Title

Global Lead – NGS Core

Division

Novartis Institutes for BioMedical Research (NIBR)

Business Unit

Research Cambridge

Country

USA

Work Location

Cambridge, MA

Company/Legal Entity

USA Novartis Institutes for BioMedical Research, Inc., Cambridge, MA

Functional Area

Research

Job Type

Full Time

Employment Type

Regular

Job Description

About Novartis Institutes for Biomedical Research:
The Novartis Institutes for BioMedical Research is the global pharmaceutical research organization for Novartis. The NIBR research network is comprised of more than 6,000 scientists, physicians, and business professionals working together across 10 locations around the world to discover innovative medicines that treat diseases with high unmet medical need.

(find out more or apply here…)

Incoming search terms:

  • rna sequencing jobs
  • global lead - ngs core novartis
  • global lead - ngs core novartis institutes for biomedical research
  • novartis ngs jobs
  • Segmentation fault AND cufflinks

this was originally posted on the SEQanswers forum by bodhisattvax

I am looking to build a ‘standard’ RNA-seq data analysis pipeline for analysing differential gene and possibly transcript expression.

I am aware that there are a variety of tools out there for the various steps (alignment, counting, differential expression), each with their respective pros and cons, cheerleaders and dissers.

So I have created a (short) survey which I think could be useful to all of us, to try and see if we are moving towards some consensus about the preferred methodology for each of the steps.

The survey is at
http://www.surveymonkey.com/s/72953N9
I would be very grateful if you could fill it out : it should only take a few minutes of your time.

You may prefer to respond within this thread itself but being an optimistic soul, I’m hoping that I get so many responses that I will need to use the results analysis tools on survey monkey!

Of course, I will make the results available either here or on request.

Thanks in advance.

Incoming search terms:

  • rna-seq blog survey
  • rna seq survey

We asked: Which candidate would you vote for in the upcoming U.S. presidential election?

Blog Poll Results

We closed this poll just before the actual election results were revealed so as not to influence our own poll.  Looks like Barack Obama wins again, and this time by a wide margin!   n=97

Check out the new poll in the left-hand sidebar and cast your vote today.

Incoming search terms:

  • obama seq

Genome-wide profiling of alternative splicing is not new. Before the invent of RNA-Seq technologies, genome-wide profiling of RNA splicing in biological samples included exon arrays, splice junction arrays, and genome-wide tiling arrays. Use of these technologies to profile known splicing events in various biological contexts has already revealed the importance of splicing in cancer research. A recent review of genome-wide profiling of splicing in cancer using various microarray platforms suggests that splicing in cancer is prevalent, regulated and that novel therapeutic strategies are emerging.

The success of microarrays in profiling known splicing in cancer can be extended to identifying tumor specific splicing events in reads from RNA-Seq using virtual microarray experiments. In such an experiment, short RNA reads from RNA-Seq can be considered virtual equivalent of cellular RNA, in silico mapping of reads can be considered virtual equivalent of hybridization and the sequences of exon-exon junction probes equivalent of virtual microarray platform. Hence, a non-redundant reference database of known splice junctions can be used to directly map RNA reads to detect and measure expression levels of known splice events. Although such an approach is limited to detection, by augmenting the database with predicted junctions, one could also infuse discovery into this approach.

Here, researchers at the Institute of Bioinformatics and Applied Biotechnology, Bangalore, India have profiled less than a million known plus predicted splice events to identify tumor-specific splicing in prostate tumor using a RNA-Seq dataset of matched tumor-normal from ten individuals downloaded from NCBI public repository.

  • Srinivasan S, Patil AH, Verma M, Bingham JL, Srivatsan R. (2012) Genome-wide Profiling of RNA splicing in prostate tumor from RNA-seq data using virtual microarrays. J of Clin Bioinform [Epub ahead of print]. [article]

RNA-seq from Isolation to Analysis.
Speaker. Fabio Raffaldi, Sr FAS Ion Torrent / SOLiD + RNA ESPOC.
Kelli Bramlett – Life/Ambion R&D.
Tom Bittick Life/AmbionProduct Management

RNA-SEQ Workshop - Life Technologies

Incoming search terms:

  • RNA seq ion torrent
  • hiseq 2500
  • ion proton protocol
  • ion torrent transcriptome publications
  • life technologies workshop
  • rna presentation background

from yjhua2110 at seqanswers.com

We have constructed expression profiles of long noncoding RNAs (lncRNAs, lincRNAs) and protein-coding genes (mRNAs) from RNA-Seq data across 22 normal tissues (Human BodyMap 2.0 data from Illumina) generated by Cabili et al. (Cabili et al. 2011, Genes Dev., 25, 1915-1927.). We hope it will help your research.

(1) User can find tissue-specific lncRNAs and mRNAs and expression pattern of each gene by viewing heatmap constructed by us. (2)Move mouse cursor on heatmap to see details or click lncRNA or mRNA name to launch detail page. (3) Click the title of the heatmap (e.g. gene symbol, lncRNA name, nearest gene, gene symbol, tissues(e.g. liver, lung…)), to sort whole heatmap.

Examples:
(a) access lncRNA expression profiles.

(b) access protein-coding expression profiles:

Incoming search terms:

  • integrated analysis of lncrna and mrna
  • RNA-Seq: Novel mRNA analysis
  • lncrna cancer rna seq
  • lncRNA mRNA
  • lncrna rna seq cancer
  • lncrna and mrna network analysis
  • protocal for rna-seq lncRNA
  • rna seq lncrna cancer
  • rna-seq lncrna analysis
  • lncRNAs mRNAs RNAseq

Novel technologies brought in unprecedented amounts of high-throughput sequencing data along with great challenges in their analysis and interpretation. The percent-spliced-in (PSI, Ψ) metric estimates the incidence of single-exon skipping events and can be computed directly by counting reads that align to known or predicted splice junctions. However, the vast majority of human splicing events are more complex than single-exon skipping.

A team led by scientists at the Centre de Regulació Genòmica, Spain has now developed a framework that generalizes the Ψ metric to arbitrary classes of splicing events. They change the view from exon-centric to intron-centric and split the value of Ψ into two indices, ψ(5) and ψ(3), measuring the rate of splicing at the 5′- and 3′-end of the intron, respectively. The advantage of having two separate indices is that they deconvolute two distinct elementary acts of the splicing reaction. The completeness of splicing index (COSI) is decomposed in a similar way. This framework is implemented as bam2ssj, a BAM-file processing pipeline for strand-specific counting of reads that align to splice junctions or overlap with splice sites. It can be used as a consistent protocol for quantifying splice junctions from RNA-seq data since no such standard procedure currently exists.

AVAILABILITY: The C(++) code of bam2ssj is open-source and is available at https://github.com/pervouchine/bam2ssj CONTACT: dp@crg.eu.

Pervouchine DD, Knowles DG, Guigó R. (2012) Intron-Centric Estimation of Alternative Splicing from RNA-seq data. Bioinformatics [Epub ahead of print]. [article]

Incoming search terms:

  • ASprofile
  • Opportunities and Methods for Studying Alternative Splicing in Cancer with RNA-Seq
  • junction reads
  • finding alternative spliced transcripts from rnaseq data
  • intron-centric estimation of alternative splicing from rna-seq data
  • olego: fast and sensitive mapping of spliced mrna-seq reads using small seeds
  • rna-seq alternative splicing biological replicates
  • rna-seq alternative splicing database

Scientists at the University of California, Berkeley have developed eXpress, a software package for efficient probabilistic assignment of ambiguously mapping sequenced fragments. eXpress uses a streaming algorithm with linear run time and constant memory use. It can determine abundances of sequenced molecules in real time and can be applied to ChIP-seq, metagenomics and other large-scale sequencing data. They demonstrate its use on RNA-seq data and show that eXpress achieves greater efficiency than other quantification methods.

The eXpress software is freely available as Supplementary Software and at http://bio.math.berkeley.edu/eXpress/.

eXpress

  • Roberts A, Pachter L. (2012) Streaming fragment assignment for real-time analysis of sequencing experiments. Nat Methods [Epub ahead of print]. [abstract]

Incoming search terms:

  • rnaseq analysis flowchart
  • rna-seq express
  • express cufflinks cuffdiff
  • express quantification
  • express RNA tools
  • express rnaseq quantification
  • express streaming quantification for high-throughput sequencing
  • rna-seq data folder structure

X-Gen Congress

3rd Annual RNA-Seq Conference Announcement and Early Bird Savings

Presented by CHI’s X-Gen Congress
3rd Annual RNA-Seq: Differential Expression in Depth
March 18 – 20, 2013 | San Diego, CA
http://www.xgencongress.com/RNA-Seq

RNA-Seq is perhaps the most complex NGS application. The range, depth, and complexity of a human transcriptome is far from fully characterized. RNA transcripts, by nature, are moving targets, making their characterization and quantification difficult. A single RNA-Seq experiment can provide relatively unbiased sequence information for analysis of gene expression, novel transcripts, novel isoforms, alternative splice sites, allele-specific expression, cSNPs, and rare transcripts, depending on read depth.

Join us March 18-20 in San Diego, California at CHI’s Third Annual RNA-Seq: Differential Expression in Depth conference to discuss NGS technical improvements providing new insights into our active genome.

Register by Friday, November 16 and Save up to $450! http://www.xgencongress.com/RNA-Seq

Incoming search terms:

  • rnaseq conference
  • practical course illumina deep sequencing 2013
  • seq conference

from the Broad Institute and the Hebrew University of Jerusalem

A primary use of RNA-Seq is to identify transcribed regions of a genome, and to reconstruct the structures of transcripts including alternatively spliced variants. Current state-of-the-art methods for genome-based transcript reconstruction involve aligning RNA-Seq reads to the genome using spliced (intron-aware) aligners, and then assembling the alignments to reconstuct transcript structures (eg. cufflinks, scripture). We refer to this as the align-reads then assemble-alignments approach. Trinity supports an alternative, hybrid approach to genome-based transcript reconstruction that uses a combination of RNA-Seq alignments to a genome coupled with RNA-seq read de novo assembly and transcript alignment assembly. This alternative approach involves four major steps: align-reads, assemble-reads, align-transcripts, then assemble-transcript_alignments. Specifically, the process involves:

  • align-reads: GSNAP is used to align reads to the genome sequence. Reads are then partitioned into read-covered regions of the genome.
  • assemble-reads: Trinity is used to assemble the RNA-Seq reads in each partition. This can be done in a massiviely parallel manner, typically requiring little RAM as compared to whole de novo RNA-Seq assemblies, and can be executed using standard hardware.
  • align-transcripts: The Trinity-assembled transcripts are aligned back to the genome using GMAP, as part of the PASA software pipeline.
  • assemble-transcript_alignments: The transcript alignments are assembled by PASA into complete transcript structures, resolving alternatively spliced transcript structures.

We’ve found this system to be highly effective for annotation of diverse eukaryotic genomes, from the compact genomes of microbial eukaryotes to the more expanse genomes of plants and vertebrates. The resulting transcript structures are provided in popular file formats for downstream analysis, including visualization (ex. bed for IGV), expression analysis (gtf for Tuxedo), or coding gene identification (gff3 for EVidenceModeler, gtf for TransDecoder).

(read more…)

Incoming search terms:

  • trinity rna snp calling
  • annotate transcripts trinity
  • estimate the stat of fasta in trinity
  • rna seq quasi seq
  • trinity stat rnaseq

Next Page →

  • Social Networking Pages

    Linkedin Group

  • Follow Me on Pinterest
  • RSS SEQanswers – RNA Sequencing

    • Identifying small RNA sequence within whole genome sequence May 21, 2013
      Hi all, I want to know if there are any useful bioinformatic tool to find small RNA sequence within a whole bacteria genome. Thank you in... […]
      Inma
    • standard of clean data May 21, 2013
      Hi all I recently got my prokaryotes RNA-seq data report back. the standard filter steps of the raw data set by our local sequencing center is as... […]
      Pengfei Liu
    • Problem with cummeRbund diffData() May 20, 2013
      Hi all, I'm running Tophat/cufflinks/cuffdiff for differential gene expression and analysis with cummeRbund (v 2.0.0). I'm having an issue with... […]
      Enrique Zudaire
    • How to increase rowsize in heatmap? May 16, 2013
      Hi, I am a complete newbie to all things cummeRbund and am currently fighting with generating readable heatmaps. When I use ... […]
      Mags
    • novoalign mapping May 15, 2013
      Hi, I want to use novoalign to map reads - allowing up to 15 mismatches for 100 bp paired-end reads I am new to novoalign(went through the... […]
      abh
    • Design of expt across multiple lanes May 15, 2013
      Hi, I am performing an RNA-seq experiment to look at differential expression. The design is as follows: 2 populations x 3 biological... […]
      jbono
  • RSS Biostar – RNA-Seq

    • What are the best practices for SNP identification in RNA seq transcriptome data
      I have 20 RICE RNA seq tranascriptome data hiseq 2000 platform paired end reads. I aligned fasta reads with BWA and remove PCR duplicates with PICARD. Later I call SNP with samtools using various parameters. I would like to clarify what parameters should I used while alinging to reference rice genome for looking SNP location 100 bp upstream and 250 bp downst […]
    • How do TopHat options -g , --supress-hits, and Bowtie options interplay?
      Hi, I am currently using TopHat2 to map RNA-seq runs. I think there have been some changes pertaining the -g option. Does anyone know how it works now? I used to think that setting -g would look for n alignments for a given read, report them [if top-scoring] and discard those reads that had more than g [top scoring] alignments. Now, the description sounds mo […]
    • What happened to -k in TopHat for multiple-mapping reads?
      Selecting -g n in tophat does not discard reads mapping more than n, but instead only reports n alignments for those out all all their TOP scoring alignments. I think there used to be an option -k that would allow one to discard reads that topped x alignments -- whatever happened to that? I only see -g in the tophat 2 manual, no reporting options like before […]
    • Does tophat use the library-type information for mapping, or just for the XS flag?
      When I specify library-type to TopHat, i.e., first-strand, second-strand, unstranded, TopHat appends a value + or - to the XS:A tag, which is useful for subsequent analyses, such as annotation. However, does this information influence the "mappability" of reads, or is this unaffected? My guess is that the information will be considered for mapping […]
    • Purpose of Y-shaped adapters in Illumina Sequencing?
      Hi all, Y adapters different sequences to be annealed to the 5' and 3' ends of each molecule in a library. The arms of the Y are unique, and the middle part, connected to the DNA fragment, is complementary. What are the advantages of this? My take of this over having fully-complementary adapters (ADAPTER1 - - - - - ADAPTER1) is that: -Upon primer a […]
    • Cell Type composition in a tissue based on gene marker expression
      I am not sure if the following would even make sense.... Tissues are composed of composite cell types, and often there are studies such as microarray/NGS where we perform a collective sampling of cells from these tissues. Information about the composition (say percentage of cell type) is not taken into consideration. In some case (such as brain/cancer), ther […]