Graphite web is a novel web tool for pathway analyses and network visualization for gene expression data of both microarray and RNA-seq experiments. Several pathway analyses have been proposed either in the univariate or in the global and multivariate context to tackle the complexity and the interpretation of expression results. These methods can be further divided into ‘topological’ and ‘non-topological’ methods according to their ability to gain power from pathway topology. Biological pathways are, in fact, not only gene lists but can be represented through a network where genes and connections are, respectively, nodes and edges. To this day, the most used approaches are non-topological and univariate although they miss the relationship among genes. On the contrary, topological and multivariate approaches are more powerful, but difficult to be used by researchers without bioinformatic skills.

Here, researchers from the University of Padova, Italy present Graphite web, the first public web server for pathway analysis on gene expression data that combines topological and multivariate pathway analyses with an efficient system of interactive network visualizations for easy results interpretation. Specifically, Graphite web implements five different gene set analyses on three model organisms and two pathway databases.

RNA-Seq

Availability – Graphite Web is freely available at http://graphiteweb.bio.unipd.it/.

Sales G, Calura E, Martini P, Romualdi C. (2013) Graphite Web: web tool for gene set analysis exploiting pathway topology. Nucleic Acids Res [Epub ahead of print]. [article]

Incoming search terms:

  • www rna-seqblog com graphite-web-web-tool-for-gene-set-analysis-exploiting-pathway-topology

With the availability of gene expression data by RNA-seq, powerful statistical approaches for grouping similar gene expression profiles across different environments have become increasingly important. A team led by researchers at Penn State University describe and assess a computational model for clustering genes into distinct groups based on the pattern of gene expression in response to changing environment. The model capitalizes on the Poisson distribution to capture the count property of RNA-seq data. A two-stage hierarchical expectation-maximization (EM) algorithm is implemented to estimate an optimal number of groups and mean expression amounts of each group across two environments. A procedure is formulated to test whether and how a given group shows a plastic response to environmental changes. The impact of gene-environment interactions on the phenotypic plasticity of the organism can also be visualized and characterized. The model was used to analyse an RNA-seq dataset measured from two cell lines of breast cancer that respond differently to an anti-cancer drug, from which genes associated with the resistance and sensitivity of the cell lines are identified. They performed simulation studies to validate the statistical behaviour of the model. The model provides a useful tool for clustering gene expression data by RNA-seq, facilitating understanding of gene functions and networks.

rna-seq

  • Wang N, Wang Y, Hao H, Wang L, Wang Z, Wang J, Wu R. (2013) A bi-Poisson model for clustering gene expression profiles by RNA-seq. Brief Bioinform [Epub ahead of print]. [abstract]

Incoming search terms:

  • rna-seq identified a super-long intergenic transcript functioning in adipoge
  • gene expression heart vertebrate
  • illumina sequencing scriptseq
  • star alignment r rnaseq
  • www rna-seqblog com a-bi-poisson-model-for-clustering-gene-expression-profiles-by-rna-seq

The power of deep sequencing technology to reliably detect single RNA reads leads to a paradoxical problem of high sensitivity. In hybridization or PCR based methods for RNA quantification, the concern is low sensitivity, i.e., the problem that the signal from truly expressed genes might not be distinguishable from noise. In contrast, the problem with RNA-seq is that it is not clear whether genes with very low read counts are from low expressed genes or merely transcriptional noise. The frequency distribution for read counts does not show a clear separation in two classes of genes, which makes the decision whether a gene is to be considered expressed or not seemingly arbitrary.

Here, researchers from Yale University address this problem by suggesting a statistical model that considers the number of transcripts detected in a RNA-Seq study as a mixture of two distributions: one is a exponential distribution for transcripts from inactive genes, and a negative binomial distribution for actively transcribed genes. They apply this model to a number of RNA-Seq data sets and find that the model fits the data very well. The calculated criteria for distinguishing between expressed and non-expressed gene is remarkably consistent among data sets, suggesting genes with more than two transcripts per million transcripts (TPM) are highly likely from actively transcribed genes. The regression model correctly identifies the not actively expressed class of genes and thus, provides an operational criterion for classifying genes in expressed and non-expressed sets, facilitating the interpretation of RNA-Seq data.

  •  Wagner GP, Kin K, Lynch VJ. (2013) A model based criterion for gene expression calls using RNA-seq data. Theory Biosci [Epub ahead of print]. [abstract]

Incoming search terms:

  • www rna-seqblog com exponential-negative-binomial-model-for-gene-expression-calls-using-rna-seq-data
  • clustering rna-seq
  • rna-seq for gene expression
  • RNA-seq error have influence on gene expression
  • rna sequencing and yale university
  • regulation of gene expresssion in prokaryotes
  • junctions negative binomial
  • edge-pro into deseq
  • edge-pro bacteria rna
  • dispersion matlab

CCB Johns HopkinsThe expression levels of bacterial genes can be measured directly using next-generation sequencing (NGS) methods, offering much greater sensitivity and accuracy than earlier, microarray-based methods. Most bioinformatics software for estimating levels of gene expression from NGS data has been designed for eukaryotic genomes, with algorithms focusing particularly on detection of splicing patterns. These methods do not perform well on bacterial genomes.

Here, researchers at Johns Hopkins University School of Medicine describe the first software system designed explicitly for quantifying the degree of gene expression in bacteria and other prokaryotes. EDGE-pro (Estimated Degree of Gene Expression in PROkaryotes) processes the raw data from an RNA-seq experiment on a bacterial or archaeal species and produces estimates of the expression levels for each gene in these gene-dense genomes.

Availability – The EDGE-pro tool is implemented as a pipeline of C++ and Perl programs and is freely available as open-source code at http://www.genomics.jhu.edu/software/EDGE/index.shtml.

  • Magoc T, Wood D, Salzberg SL. (2013) EDGE-pro: Estimated Degree of Gene Expression in Prokaryotic Genomes. Evol Bioinform Online 9, 127-36. [article]

Incoming search terms:

  • 유전자ppt
  • endangered species list
  • rnaseq ipa go terms
  • market share for Next Generation Sequencing Data Analysis
  • theory gene expression development :htm
  • post-gwas
  • next-generation rna-sequencing
  • phd position molecular simulation rna seq 2013
  • gs junior chip sequencing data
  • next gen sequencing blog post

AxolotlSalamanders such as the axolotl can fully regenerate a limb upon amputation, making them the vertebrate champions of regeneration. On the other hand, humans and other mammals possess a very limited ability to regenerate limb structures. Learning about the genes, gene networks, and pathways activated in the salamander during limb regeneration will provide cues to improving the regenerative response in mammals. Elucidating these genes, networks, and pathways is difficult, however, because the axolotl does not yet have its genome sequenced and because it has diverged evolutionarily from species with a sequenced genome.

Here, a team led by researchers at the Morgridge Institute for Research produce a set of gene transcripts via RNA sequencing (RNA-seq) for the axolotl and provide information on the nature of the genes activated during regeneration. To determine the identity of these axolotl genes, we use comparative transcriptomics techniques to match the axolotl transcript data to that of the well-annotated human gene set. Supporting previous studies, we find upregulation of many genes previously found to be involved in limb development and regeneration. In addition, we find a burst of cancer-related genes during the first phase of regeneration and identify a set of genes previously not associated with the regeneration process.

  • Stewart R, Rascón CA, Tian S, Nie J, Barry C, et al. (2013) Comparative RNA-seq Analysis in the Unsequenced Axolotl: The Oncogene Burst Highlights Early Gene Expression in the Blastema. PLoS Comput Biol 9(3), e1002936. [article]

Incoming search terms:

  • axolotl
  • %pf rna-seq
  • comparing gene expression from rna seq
  • expression profiling efficiency AND %PF RNA seq
  • heart gene expression RNA-Seq
  • R package for network structure learning for RNA sequencing count data
  • rnaseq clustering r

The authors (Peter Combs & Michael Eisen) have released a pre-print of their new paper, “Sequencing mRNA from cryo-sliced Drosophila embryos to determine genome-wide spatial patterns of gene expression” for an open peer review process.

Readers are encouraged to make comments on the paper here: http://www.michaeleisen.org/blog/?p=1304

Sequencing mRNA from cryo-sliced Drosophila embryos to determine genome-wide spatial patterns of gene expression

Complex spatial and temporal patterns of gene expression underlie embryo differentiation, yet methods do not yet exist for the efficient genome-wide determination of spatial patterns of gene expression. In situ imaging of transcripts and proteins is the gold-standard, but is difficult and time consuming to apply to an entire genome, even when highly automated. Sequencing, in contrast, is fast and genome-wide, but generally applied to homogenized tissues, thereby discarding spatial information. At some point, these methods will converge, and we will be able to sequence RNAs in situ, simultaneously determining their identity and location. As a step along this path, we developed methods to cryosection individual blastoderm stage Drosophila melanogaster embryos along the anterior-posterior axis and sequence the mRNA isolated from each 60 micron slice. The spatial patterns of gene expression we infer closely match patterns determined by in situ hybridization and microscopy, where such data exist, and thus we conclude that we have generated the first genome-wide map of spatial patterns in the Drosophila embryo. We identify numerous genes with spatial patterns that have not yet been screened in the several ongoing systematic in situ based projects, the majority of which are localized to the posterior end of the embryo, likely in the pole cells. This simple experiment demonstrates the potential for combining careful anatomical dissection with high-throughput sequencing to obtain spatially resolved gene expression on a genome-wide scale.

http://arxiv.org/abs/1302.4693

Incoming search terms:

  • how to analyze different expressed gene rna-seq
  • differential gene expression counts rna-seq
  • how many counts gene is expressed RNA-seq
  • RNA sequencing and gene expression
  • RNA-sequencing for gene expression in plant
  • rnaseq microarray gene expression
  • TruSeq RNA-seq stranded vs Nugen Encore complete

The cost of RNA-Seq has been decreasing over the last few years. Despite this, experiments with four or less biological replicates are still quite common. Estimating the variances of gene expression estimates becomes both a challenging and interesting problem in these situations of low replication. However, with the wealth of microarray and other publicly available gene expression data readily accessible on public repositories, these sources of information can be leveraged to make improvements in variance estimation.

A team led by researchers at the University of Sydney, Australia have developed a novel approach called Tshrink+ for inferring differential gene expression through improved modelling of the gene-wise variances. Existing methods share information between genes of similar average expression by shrinking, or moderating, the gene-wise variances to a fitted common variance. They have been able to achieve improved estimation of the common variance by using gene-wise sample variances from external experiments, as well as gene length.

Using biological data, the team shows that utilising additional external information can improve the modelling of the common variance and hence the calling of differentially expressed genes. These sources of additional information include gene length and gene-wise sample variances from other RNA-Seq and microarray datasets, of both related and seemingly unrelated tissue types. The results of this are promising, with their differential expression test, Tshrink+, performing favourably when compared to existing methods such as DESeq and edgeR when considering both gene ranking and sensitivity. These improved variance models could easily be implemented in both DESeq and edgeR and highlight the need for a database that offers a profile of gene variances over a range of tissue types and organisms.

Availability: This method is implemented in the R package sydSeq available on http://www.maths.usyd.edu.au/u/jeany/software.htm

  • Patrick E, Buckley M, Lin DM, Yang YH. (2013) Improved moderation for gene-wise variance estimation in RNA-Seq via the exploitation of external information. BMC Genomics Suppl 1, S9. [article]

Incoming search terms:

  • rna sequencing for gene expression
  • rna seq variance

We asked, “What is your favorite tool for analyzing RNA-Seq expression data at the transcript-level?”

Looks like good ol’ Cufflinks wins in a landslide.

Poll Results - Transcript Level Expression

Total votes = 78

Thanks for participating! Check out the latest poll in the left-hand sidebar and cast your vote today.

Incoming search terms:

  • proton rnaseq
  • gene expression rna-seq
  • ion proton rnaseq 2013

Identification of bimodally expressed genes is an important task, since genes with bimodal expression play important roles in cell differentiation, signaling, and disease progression. Several useful algorithms have been developed to identify bimodal genes from microarray data. Currently, no method can deal with data from next generation sequencing, which is emerging as a replacement technology for microarrays.

A team led by scientists at M. D. Anderson Cancer Center have developed SIBER (Systematic Identification of Bimodally Expressed genes using RNAseq data) for effectively identifying bimodally expressed genes from next generation RNAseq data. They evaluate several candidate methods for modeling RNAseq count data and compare their performance in identifying bimodal genes through both simulation and real data analysis. They show that the lognormal mixture model performs best in terms of power and robustness under various scenarios. The scientists also compare our method with alternative approaches including PACK and COPA. This method is robust, powerful, invariant to shifting and scaling, has no blind spots, and has a sample-size-free interpretation.

Availability: The R package SIBER is available at the web site http://bioinformatics.mdanderson.org/main/OOMPA:Overview

Contact: kcoombes@mdanderson.org

Tong P, Chen Y, Su X, Coombes KR. (2013) SIBER: Systematic Identification of Bimodally Expressed Genes Using RNAseq Data. Bioinformatics [Epub ahead of print]. [abstract]

 

Incoming search terms:

  • a new shrinkage estimator for dispersion improves differential expression detection in rna-seq data
  • RNAseq qc tools

StrigaQuantitative real-time PCR is a powerful tool for quantifying gene expression, but correct normalization of expression levels requires identification of control genes that have stable expression across tissues and life stages.

Researchers from Institute for Sustainable Agriculture, Spain have evaluated the suitability of six candidate housekeeping genes across key life stages of Striga hermonthica (a root parasitic weed that attacks many of the staple crops in Africa, India and Southeast Asia) from seed conditioning to flower initiation using qRT-PCR and high-throughput cDNA sequencing. Based on gene expression analysis by qRT-PCR and RNA-Seq across heterogeneous Striga life stages, they determined that using the combination of three genes, UBQ1, PP2A and TUB1 provides the best normalization for gene expression throughout the parasitic life cycle. The housekeeping genes characterized here provide robust standards that will facilitate powerful descriptions of parasite gene expression patterns.

  • Fernández-Aparicio M, Huang K, Wafula EK, Honaas LA, Wickett NJ, Timko MP, Depamphilis CW, Yoder JI, Westwood JH. (2012) Application of qRT-PCR and RNA-Seq analysis for the identification of housekeeping genes useful for normalization of gene expression values during Striga hermonthica development. Mol Biol Rep [Epub ahead of print]. [abstract]

Incoming search terms:

  • qrt-pcr
  • qrt-pcr analysis
  • qrt-pcr and rna-sequence
  • rna seq deno assembly qrt pcr
  • rna-seq qrt-pcr

The accurate quantification of gene expression levels is crucial for transcriptome study. Microarray has been used as a main platform for simultaneously interrogating thousands of genes in the past decade, and recently RNA-Seq has emerged as a promising alternative. The gene expression measurements obtained by microarray and RNA-Seq are however subject to various measurement errors. A third platform called qRT-PCR is acknowledged to provide more accurate quantification of gene expression levels than microarray and RNA-Seq, but it has limited throughput capacity. In this article, we propose to use a system of functional measurement error models to model gene expression measurements and calibrate the microarray and RNA-Seq platforms with qRT-PCR. Based on the system, a two-step approach was developed to estimate the biases and error variance components of the three platforms and calculate calibrated estimates of gene expression levels. The estimated biases and variance components shed light on the relative strengths and weaknesses of the three platforms and the calibrated estimates provide a more accurate and consistent quantification of gene expression levels. Theoretical and simulation studies were conducted to establish the properties of those estimates. The system was applied to analyze two gene expression data from the Microarray Quality Control (MAQC) and Sequencing Quality Control (SEQC) projects.

  • Zhaonan Sun, Thomas Kuczek, Yu Zhu. (2012) Statistical Calibration of qRT-PCR, Microarray and RNA-Seq Gene Expression Data with Measurement Error Models. Cornell Univ Lib arXiv:1212.6690. [article]

Incoming search terms:

  • microarray pcr

Development of post-GWAS (genome-wide association study) methods are greatly needed for characterizing the function of trait-associated SNPs. Strategies integrating various biological data sets with GWAS results will provide insights into the mechanistic role of associated SNPs.

Here, researchers at University of California, Berkeley present a method that integrates RNA sequencing (RNA-seq) and allele-specific expression data with GWAS data to further characterize SNPs associated with follicular lymphoma (FL). They investigated the influence on gene expression of three established FL-associated loci-rs10484561, rs2647012, and rs6457327-by measuring their correlation with human-leukocyte-antigen (HLA) expression levels obtained from publicly available RNA-seq expression data sets from lymphoblastoid cell lines. Their results suggest that SNPs linked to the protective variant rs2647012 exert their effect by a cis-regulatory mechanism involving modulation of HLA-DQB1 expression. In contrast, no effect on HLA expression was observed for the colocalized risk variant rs10484561. The application of integrative methods, such as those presented here, to other post-GWAS investigations will help identify causal disease variants and enhance our understanding of biological disease mechanisms.

RNA-Seq

  • Conde L, Bracci PM, Richardson R, Montgomery SB, Skibola CF. (2012) Integrating GWAS and Expression Data for Functional Characterization of Disease-Associated SNPs: An Application to Follicular Lymphoma. Am J Hum Genet [Epub ahead of print]. [abstract]

Incoming search terms:

  • gwas
  • rna sequencing GWAS
  • scientist coloring pages\
  • integrate gwas and chip-seq
  • Gene set enrichmen analysis of RNA-seq data: integrating differential expression and splicing
  • post-gwas ppt
  • RNA sequencing function
  • search snp database using rna expression
  • snps from RNA-Seq data
  • webtool gwas data analysis

by Monica Heger at Genomeweb

Building on a method that combines patch-clamp with single-cell transcriptome sequencing, researchers from the University of Southern California in Los Angeles are applying it to study gene expression variation in neurons.

The team was one of three to recently receive funding from the National Institutes of Health’s Common Fund, through its Single Cell Analysis Program. The other two groups include a team led by the University of San Diego, which is using RNA-seq to create a 3D transcriptional map of the human brain, and a team led by the University of Pennsylvania to use single-cell transcriptome sequencing and functional genomics technology to study transcriptome variability in heart and brain cells. Read more

Incoming search terms:

  • microarray patch clamp
  • olfactory receptor gene expression rnaseq
  • olfactory rna-seq
  • patch clamp analysis deep sequencing
  • patch clamp and and university of southern california
  • patch clamp gene expression
  • reali 2011 patch clamp
  • ScriptSeq publication

Next Page →

  • Social Networking Pages

    Linkedin Group

  • Follow Me on Pinterest
  • RSS SEQanswers – RNA Sequencing

    • The Transcript Length from Cufflinks May 25, 2013
      Hi Guys, I'm doing a fungus RNA-Seq.However, the merged transcriptome gave me very long transcripts (generally >2K). I used GeneMarES to do... […]
      hchang10
    • DESeq; can I omit timepoints during dispersal estimation? May 24, 2013
      I have a bacterial timecourse with 2 biological replicates per timepoint. There is a fair bit of variance between my replicates. I have spent the... […]
      amcloon
    • HT Seq Count stranded options May 24, 2013
      I am very new to bioinformatics, so I would be really grateful for some help! I have been using *HTSeq Count v0.5.3* and I am bit confused about... […]
      qwrissie
    • Tophat 2.0.8b installation error May 24, 2013
      I install tophat-2.0.8b to rerun the mapping. but when i make it, the error appears like this. make[1]: Entering directory... […]
      canhu
    • reason for low mapping rate?? May 23, 2013
      we did RNASeq using HiSeq 2000 100PE. When the data were back, I mapping them to the reference sequence, but got very low mapping rate (30-40%). I... […]
      miaom
    • cross-species data - questions about normalization May 23, 2013
      Hi, I have some data form various samples (cell types) in different species. I want to compare and analyze gene expression variability across the... […]
      trelek2
  • RSS Biostar – RNA-Seq

    • Why am I getting so many unmapped reads in STAR, classified as "too short"?
      I am currently using STAR to map several Hi-SEQ mRNA runs. I'm having trouble getting a decent amount of reads to map, but I don't really understand why. I'm hoping you can shed some light :) In the final log, only about 50% (or less) of the reads map to the reference. I'm using a GTF in addition to the genome. The unmapped bin that most […]
    • What are the best practices for SNP identification in RNA seq transcriptome data
      I have 20 RICE RNA seq tranascriptome data hiseq 2000 platform paired end reads. I aligned fasta reads with BWA and remove PCR duplicates with PICARD. Later I call SNP with samtools using various parameters. I would like to clarify what parameters should I used while alinging to reference rice genome for looking SNP location 100 bp upstream and 250 bp downst […]
    • How do TopHat options -g , --supress-hits, and Bowtie options interplay?
      Hi, I am currently using TopHat2 to map RNA-seq runs. I think there have been some changes pertaining the -g option. Does anyone know how it works now? I used to think that setting -g would look for n alignments for a given read, report them [if top-scoring] and discard those reads that had more than g [top scoring] alignments. Now, the description sounds mo […]
    • What happened to -k in TopHat for multiple-mapping reads?
      Selecting -g n in tophat does not discard reads mapping more than n, but instead only reports n alignments for those out all all their TOP scoring alignments. I think there used to be an option -k that would allow one to discard reads that topped x alignments -- whatever happened to that? I only see -g in the tophat 2 manual, no reporting options like before […]
    • Does tophat use the library-type information for mapping, or just for the XS flag?
      When I specify library-type to TopHat, i.e., first-strand, second-strand, unstranded, TopHat appends a value + or - to the XS:A flag, which is useful for subsequent analyses, such as annotation. However, does this information actually influence the "mappability" of reads, or is this unaffected? My thinking is that the information would be considere […]
    • Purpose of Y-shaped adapters in Illumina Sequencing?
      Hi all, Y adapters different sequences to be annealed to the 5' and 3' ends of each molecule in a library. The arms of the Y are unique, and the middle part, connected to the DNA fragment, is complementary. What are the advantages of this? My take of this over having fully-complementary adapters (ADAPTER1 - - - - - ADAPTER1) is that: -Upon primer a […]