May
21
Voom! Precision weights unlock linear model analysis tools for RNA-Seq read counts
Filed Under Analysis Pipelines, Expression and Quantification | Leave a Comment
Voom: variance modelling at the observation-level
In the past few years, RNA-seq has emerged as a revolutionary new technology for expression profiling. RNA-seq expression data consists of read counts, and many recent publications have argued therefore that RNA-seq data should be analysed by statistical methods designed specifically for counts. Yet all the statistical methods developed for RNA-seq counts rely on approximations of various kinds.
This article revisits the idea of applying normal-based microarray-like statistical methods to RNA-seq read counts, with the idea that it is more important to model the mean-variance relationship correctly than it is to specify the exact probabilistic distribution of the counts. Log-counts per million are used as expression values. The voom method estimates the mean-variance relationship robustly and generates a precision weight for each individual normalized observation. The normalized log-counts per million and associated precision weights are then entered into the limma analysis pipeline, or indeed into any statistical pipeline for microarray data that is precision weight aware. This opens access for RNA-seq analysts to a large body of methodology developed for microarrays, allowing RNA-seq and microarray data to be analysed in closely comparable ways. The performance of voom and related limma-based pipelines is compared to that of edgeR, DESeq, baySeq, TSPM, PoissonSeq, and DSS. Simulation studies show that voom out-performs previous RNA-seq methods even when the data is generated according to the assumptions of the earlier methods. This is especially true when the sequence depths vary between RNA samples. Several data sets are also analysed to demonstrate how voom can handle heterogeneous data and complex experiments as well as facilitating pathway analysis and gene set testing methods.
Incoming search terms:
- The RNA-seq Tuxedo pipeline
- www rna-seqblog com voom-precision-weights-unlock-linear-model-analysis-tools-for-rna-seq-read-counts
May
20
A bi-Poisson model for clustering gene expression profiles by RNA-seq
Filed Under Analysis Pipelines, Expression and Quantification | Leave a Comment
With the availability of gene expression data by RNA-seq, powerful statistical approaches for grouping similar gene expression profiles across different environments have become increasingly important. A team led by researchers at Penn State University describe and assess a computational model for clustering genes into distinct groups based on the pattern of gene expression in response to changing environment. The model capitalizes on the Poisson distribution to capture the count property of RNA-seq data. A two-stage hierarchical expectation-maximization (EM) algorithm is implemented to estimate an optimal number of groups and mean expression amounts of each group across two environments. A procedure is formulated to test whether and how a given group shows a plastic response to environmental changes. The impact of gene-environment interactions on the phenotypic plasticity of the organism can also be visualized and characterized. The model was used to analyse an RNA-seq dataset measured from two cell lines of breast cancer that respond differently to an anti-cancer drug, from which genes associated with the resistance and sensitivity of the cell lines are identified. They performed simulation studies to validate the statistical behaviour of the model. The model provides a useful tool for clustering gene expression data by RNA-seq, facilitating understanding of gene functions and networks.

- Wang N, Wang Y, Hao H, Wang L, Wang Z, Wang J, Wu R. (2013) A bi-Poisson model for clustering gene expression profiles by RNA-seq. Brief Bioinform [Epub ahead of print]. [abstract]
Incoming search terms:
- rna-seq identified a super-long intergenic transcript functioning in adipoge
- gene expression heart vertebrate
- illumina sequencing scriptseq
- star alignment r rnaseq
- www rna-seqblog com a-bi-poisson-model-for-clustering-gene-expression-profiles-by-rna-seq
May
16
ReXpress – for updating abundance estimates from RNA-Seq experiments upon re-annotation
Filed Under Expression and Quantification, Other Tools | Leave a Comment
The estimation of isoform abundances from RNA-Seq data requires a time-intensive step of mapping reads to either an assembled, or previously annotated transcriptome, followed by an optimization procedure for deconvolution of multi-mapping reads. These procedures are essential for downstream analysis such as differential expression. In cases where it is desirable to adjust the underlying annotation, for example upon the discovery of novel isoforms or errors in existing annotations, current pipelines must be rerun from scratch. This makes it difficult to update abundance estimates after re-annotation, or to explore the effect of changes in the transcriptome on analyses.
Researchers at UC Berkeley have developed a novel efficient algorithm for updating abundance estimates from RNA-Seq experiments upon re-annotation that does not require re-analysis of the entire dataset. Their approach is based on a fast partitioning algorithm for identifying transcripts whose abundances may depend on the added or deleted isoforms, and on a fast follow-up approach to re-estimating abundances for all transcripts. They demonstrate the effectiveness of our methods by showing how to synchronize RNA-Seq abundance estimates with the daily RefSeq incremental updates. Thus, they provide a practical approach to maintaining relevant databases of RNA-Seq derived abundance estimates even as annotations are being constantly revised.

Availability – ReXpress is freely available, together with source code, at http://bio.math.berkeley.edu/ReXpress/
Contact: lpachter@math.berkeley.edu
- Roberts A, Schaeffer L, Pachter L. (2013) Updating RNA-Seq analyses after re-annotation. Bioinformatics [Epub ahead of print]. [abstract]
Incoming search terms:
- www rna-seqblog com rexpress-for-updating-abundance-estimates-from-rna-seq-experiments-upon-re-annotation
Apr
26
Exponential-negative-Binomial model – for gene expression calls using RNA-Seq data
Filed Under Expression and Quantification | Leave a Comment
The power of deep sequencing technology to reliably detect single RNA reads leads to a paradoxical problem of high sensitivity. In hybridization or PCR based methods for RNA quantification, the concern is low sensitivity, i.e., the problem that the signal from truly expressed genes might not be distinguishable from noise. In contrast, the problem with RNA-seq is that it is not clear whether genes with very low read counts are from low expressed genes or merely transcriptional noise. The frequency distribution for read counts does not show a clear separation in two classes of genes, which makes the decision whether a gene is to be considered expressed or not seemingly arbitrary.
Here, researchers from Yale University address this problem by suggesting a statistical model that considers the number of transcripts detected in a RNA-Seq study as a mixture of two distributions: one is a exponential distribution for transcripts from inactive genes, and a negative binomial distribution for actively transcribed genes. They apply this model to a number of RNA-Seq data sets and find that the model fits the data very well. The calculated criteria for distinguishing between expressed and non-expressed gene is remarkably consistent among data sets, suggesting genes with more than two transcripts per million transcripts (TPM) are highly likely from actively transcribed genes. The regression model correctly identifies the not actively expressed class of genes and thus, provides an operational criterion for classifying genes in expressed and non-expressed sets, facilitating the interpretation of RNA-Seq data.
- Wagner GP, Kin K, Lynch VJ. (2013) A model based criterion for gene expression calls using RNA-seq data. Theory Biosci [Epub ahead of print]. [abstract]
Incoming search terms:
- www rna-seqblog com exponential-negative-binomial-model-for-gene-expression-calls-using-rna-seq-data
- clustering rna-seq
- rna-seq for gene expression
- RNA-seq error have influence on gene expression
- regulation of gene expresssion in prokaryotes
- junctions negative binomial
- edge-pro into deseq
- edge-pro bacteria rna
- dispersion matlab
- deep sequencing rnaseq
Apr
24
ShortStack – comprehensive de novo annotation and quantification of small RNA genes
Filed Under Annotation, Expression and Quantification | Leave a Comment
Small RNA sequencing allows genome-wide discovery, categorization, and quantification of genes producing regulatory small RNAs. Many tools have been described for annotation and quantification of microRNA loci (MIRNAs) from small RNA-seq data. However, in many organisms and tissue types, MIRNA genes comprise only a small fraction of all small RNA-producing genes.
ShortStack is a stand-alone application that analyzes reference-aligned small RNA-seq data and performs comprehensive de novo annotation and quantification of the inferred small RNA genes. ShortStack’s output reports multiple parameters of direct relevance to small RNA gene annotation, including RNA size distributions, repetitiveness, strandedness, hairpin-association, MIRNA annotation, and phasing. In this study, ShortStack is demonstrated to perform accurate annotations and useful descriptions of diverse small RNA genes from four plants (Arabidopsis, tomato, rice, and maize) and three animals (Drosophila, mice, and humans). ShortStack efficiently processes very large small RNA-Seq data sets using modest computational resources, and its performance compares favorably to previously described tools. Annotation of MIRNA loci by ShortStack is highly specific in both plants and animals.
Availability: ShortStack is freely available under a GNU General Public License – ShortStack – Axtell Lab @ Penn State
Axtell MJ. (2013) ShortStack: Comprehensive annotation and quantification of small RNA genes. RNA [Epub ahead of print]. [abstract]
Incoming search terms:
- accepted_hit bam
- next gen sequencing blog ion
- poster result is sequencing pdf
- RNA-seq looking for small rna
- rsem: accurate transcript quantification from rna-seq data with or without a reference genome
- soap
- SOAP de novo-Trans
- www rna-seqblog com shortstack-comprehensive-de-novo-annotation-and-quantification-of-small-rna-genes
Apr
24
sSeq – Shrinkage estimation of dispersion in Negative Binomial models for RNA-Seq experiments with small sample size
Filed Under Expression and Quantification | Leave a Comment
RNA-Seq experiments produce digital counts of reads that are affected by both biological and technical variation. To distinguish the systematic changes in expression between conditions from noise, the counts are frequently modeled by the Negative Binomial distribution. However, in experiments with small sample size, the per-gene estimates of the dispersion parameter are unreliable.
Researchers at the European Molecular Biology Laboratory, Germany and Purdue University have devloped a simple and effective approach for estimating the dispersions. First, they obtain the initial estimates for each gene using the method of moments. Second, the estimates are regularized, i.e. shrunk towards a common value that minimizes the average squared difference between the initial estimates and the shrinkage estimates. The approach does not require extra modeling assumptions, is easy to compute and is compatible with the exact test of differential expression.
They evaluated the proposed approach using 10 simulated and experimental datasets and compared its performance with that of currently popular packages edgeR, DESeq, baySeq, BBSeq and SAMseq. For these datasets, sSeq performed favorably for experiments with small sample size in sensitivity, specificity and computational time.

Availability: http://www.stat.purdue.edu/∼ovitek/Software.html and Bioconductor.
Contact: ovitek@purdue.edu
- Yu D, Huber W, Vitek O. (2013) Shrinkage estimation of dispersion in Negative Binomial models for RNA-seq experiments with small sample size. Bioinformatics [Epub ahead of print] [article]
Incoming search terms:
- sseq
- www rna-seqblog com sseq-shrinkage-estimation-of-dispersion-in-negative-binomial-models-for-rna-seq-experiments-with-small-sample-size
Apr
12
Probe region expression estimation for RNA-seq data for improved microarray comparability
Filed Under Expression and Quantification | Leave a Comment
Rapidly growing public gene expression databases contain a wealth of data for building an unprecedentedly detailed picture of human biology and disease. This data comes from many diverse measurement platforms that make integrating it all difficult. In this paper, researchers from the University of Helsinki, Finland and Stockholm University, Sweden propose a new method for processing RNA-sequencing data that yields gene expression estimates that are much more similar to corresponding estimates from microarray data, hence greatly improving cross-platform comparability. The method, called PREBS is based on estimating the expression only from microarray probe regions, and processing these estimates with microarray summarisation algorithm RMA. This allows new ways of using RNA-sequencing data, such as expression estimation for microarray probe sets. Gene signatures defined based on PREBS expression measures of RNA-sequencing data are much more accurate for retrieval of similar microarray samples from a database.

Availability: http://www.bioconductor.org/packages/2.12/bioc/html/prebs.html
Uziela K, Honkela A.(2013) Probe region expression estimation for RNA-seq data for improved microarray comparability. arXiv:1304.1698 [q-bio.GN]. [article]
Incoming search terms:
- compute graph with EBseq results
Mar
29
EDGE-pro – Estimated Degree of Gene Expression in PROkaryots
Filed Under Expression and Quantification | Leave a Comment
The expression levels of bacterial genes can be measured directly using next-generation sequencing (NGS) methods, offering much greater sensitivity and accuracy than earlier, microarray-based methods. Most bioinformatics software for estimating levels of gene expression from NGS data has been designed for eukaryotic genomes, with algorithms focusing particularly on detection of splicing patterns. These methods do not perform well on bacterial genomes.
Here, researchers at Johns Hopkins University School of Medicine describe the first software system designed explicitly for quantifying the degree of gene expression in bacteria and other prokaryotes. EDGE-pro (Estimated Degree of Gene Expression in PROkaryotes) processes the raw data from an RNA-seq experiment on a bacterial or archaeal species and produces estimates of the expression levels for each gene in these gene-dense genomes.
Availability – The EDGE-pro tool is implemented as a pipeline of C++ and Perl programs and is freely available as open-source code at http://www.genomics.jhu.edu/software/EDGE/index.shtml.
- Magoc T, Wood D, Salzberg SL. (2013) EDGE-pro: Estimated Degree of Gene Expression in Prokaryotic Genomes. Evol Bioinform Online 9, 127-36. [article]
Incoming search terms:
- 유전자ppt
- endangered species list
- rnaseq ipa go terms
- market share for Next Generation Sequencing Data Analysis
- theory gene expression development :htm
- post-gwas
- next-generation rna-sequencing
- phd position molecular simulation rna seq 2013
- gs junior chip sequencing data
- next gen sequencing blog post
Mar
27
Differential expression analysis for paired RNA-seq data
Filed Under Expression and Quantification | Leave a Comment
RNA-Seq technology measures the transcript abundance by generating sequence reads and counting their frequencies across different biological conditions. To identify differentially expressed genes between two conditions, it is important to consider the experimental design as well as the distributional property of the data. In many RNA-Seq studies, the expression data are obtained as multiple pairs, e.g., pre- vs. post-treatment samples from the same individual. We seek to incorporate paired structure into analysis.
Now, a team led by researchers at Yale University have developed a Bayesian hierarchical mixture model for RNA-Seq data to separately account for the variability within and between individuals from a paired data structure. The method assumes a Poisson distribution for the data mixed with a gamma distribution to account variability between pairs. The effect of differential expression is modeled by two-component mixture model. The performance of this approach is examined by simulated and real data.
In this setting, the proposed model provides higher sensitivity than existing methods to detect differential expression. Application to real RNA-Seq data demonstrates the usefulness of this method for detecting expression alteration for genes with low average expression levels or shorter transcript length.
Availability: The method was implemented in R and is available at http://bioinformatics.med.yale.edu
- Chung LM, Ferguson JP, Zheng W, Qian F,Bruno V, Montgomery RR, Zhao H(2013) Differential expression analysis for paired RNA-seq data. BMC Bioinformatics 14, 110. [abstract]
Incoming search terms:
- RNA bioinformatics
- cigar
- cryptic RNA-seq bioinformatics
- bioinformatics rna-seq analysis
- list of RNA-sequencing service institute and university
- rna seq differential expression comparison master thesis
- rna seq structural variation
- rna sequencing data analysis service
- Baylor Research Institute Dallas
- rna sequencing yale
Mar
20
Comparison of methods for differential expression analysis of RNA-Seq data
Filed Under Data Analysis, Expression and Quantification, Publications | Leave a Comment
Finding genes that are differentially expressed between conditions is an integral part of understanding the molecular basis of phenotypic variation. In the past decades, DNA microarrays have been used extensively to quantify the abundance of mRNA corresponding to different genes, and more recently high-throughput sequencing of cDNA (RNA-Seq) has emerged as a powerful competitor. As the cost of sequencing decreases, it is conceivable that the use of RNA-Seq for differential expression analysis will increase rapidly. To exploit the possibilities and address the challenges posed by this relatively new type of data, a number of software packages have been developed especially for differential expression analysis of RNA-Seq data.
Scientists at the Swiss Institute of Bioinformatics have conducted an extensive comparison of eleven methods for differential expression analysis of RNA-Seq data. All methods are freely available within the R framework and take as input a matrix of counts, i.e. the number of reads mapping to each genomic feature of interest in each of a number of samples. They evaluated the methods based on both simulated data and real RNA-Seq data.
The found that very small sample sizes, which are still common in RNA-Seq experiments, impose problems for all evaluated methods and any results obtained under such conditions should be interpreted with caution. For larger sample sizes, the methods combining a variance-stabilizing transformation with the ‘limma’ method for differential expression analysis perform well under many different conditions, as does the nonparametric SAMseq method.

Soneson C, Delorenzi M. (2013) A comparison of methods for differential expression analysis of RNA-Seq data. BMC Bioinformatics 14(1), 91. [article]
Incoming search terms:
- swiss institute of bioinformatics rna seq
- differential gene variance expression
- rna-seq differential expression bayesian vs cuffdiff
- differential expression in rnaseq
- rsem deseq workflow
- rna expression methods ppt
- time course seq differentially expressed
- rna seq expression
- Overdispersion RNA-seq cummeRbund
- course differential expression 2013
Mar
18
Methods to study Event/Isoform Expression and Alternative Splicing from RNA-Seq
Filed Under Analysis Pipelines, Expression and Quantification, Other Tools, Pathway Analysis, Splicing and Junction Mapping, Transcriptome Assembly Tools, Unspliced Mapping Tools | Leave a Comment
The RegulatoryGenomics website posts and updates a comprehensive list of tools for RNA-Seq analysis.
This is their current version.
|
Spliced-mappers |
||
|
Method |
Reference |
Web-site |
|
TopHap |
(Trapnell et al. 2009) |
|
|
MapSplice |
(Wang et al. 2010) |
|
|
SpliceMap |
(Auger et al. 2010) |
|
|
HMMSplicer |
(Dimon et al. 2010) |
|
|
TrueSight |
(Li et al. 2012b) |
|
|
SOAPsplice |
(Huang et al. 2011) |
|
|
PASSion |
(Zhang et al. 2012) |
|
|
PALMapper |
(Jean et al. 2010) |
|
|
SplitSeek |
(Ameur et al. 2010) |
|
|
Supersplat |
(Bryant et al. 2010) |
|
|
SeqSaw |
(Wang et al. 2011) |
http://bioinfo.au.tsinghua.edu.cn/software/seqsaw |
|
MapNext |
(Bao et al. 2009) |
|
|
STAR |
(Dobin et al. 2012) |
|
|
GSNAP |
(Wu et al. 2010) |
|
|
QPALMA |
(De Bona et al. 2008) |
|
|
OSA |
(Hu et al. 2012) |
|
| Read more | ||
Incoming search terms:
- pathyway analysis for rna seq data
- statistical methods for differential pathway activities
- star splice junctions
- solas rna analysis
- scarf file rna
- rnaseq alternative splicing trinity
- rna seq alternative splicing method
- alternative splicing expression
- MethodstostudyEvent/IsoformExpressionandAlternativeSplicingfromRNA-Seq|RNA-SeqBlog
- junction map mrna deep sequencing
Mar
6
Canonical correlation analysis (CCA) for RNA-Seq co-expression networks
Filed Under Expression and Quantification | 1 Comment
Digital transcriptome analysis by next-generation sequencing discovers substantial mRNA variants. Variation in gene expression underlies many biological processes and holds a key to unravelling mechanism of common diseases. However, the current methods for construction of co-expression networks using overall gene expression are originally designed for microarray expression data, and they overlook a large number of variations in gene expressions.
To use information on exon, genomic positional level and allele-specific expressions, researchers at Fudan University, China have developed novel component-based methods, single and bivariate canonical correlation analysis, for construction of co-expression networks with RNA-Seq data. To evaluate the performance of our methods for co-expression network inference with RNA-Seq data, they are applied to lung squamous cell cancer expression data from TCGA database and their own bipolar disorder and schizophrenia RNA-Seq study. The preliminary results demonstrate that the co-expression networks constructed by canonical correlation analysis and RNA-Seq data provide rich genetic and molecular information to gain insight into biological processes and disease mechanism. These new methods substantially outperform the current statistical methods for co-expression network construction with microarray expression data or RNA-Seq data based on overall gene expression levels.
Availability: A program for implementing the developed CCA for co-expression network construction can be downloaded from bioconductor (http://www.bioconductor.org/) and at http://www.sph.uth.tmc.edu/hgc/faculty/xiong/index.htm
- Hong S, Chen X, Jin L, Xiong M. (2013) Canonical correlation analysis for RNA-seq co-expression networks. Nucleic Acids Res [Epub ahead of print]. [article]
Incoming search terms:
- coexpression network RNA-seq
- Canonical correlation analysis for RNA-seq co-expression networks
- build rna-seq network out of cuffdiff data
- software analysis coexpression transcriptome sequencing
- canonical correlation chipseq rnaseq
- rna-seq cufflinks coexpression network
- network rnaseq
- Hong S Chen
- Gene network from rna-seq data
- gene expression correlation analysis
Feb
22
EBSeq – An empirical Bayes hierarchical model for inference in RNA-Seq experiments
Filed Under Data Analysis, Expression and Quantification | Leave a Comment
Messenger RNA expression is important in normal development and differentiation, as well as in manifestation of disease. RNA-Seq experiments allow for the identification of differentially expressed (DE) genes and their corresponding isoforms on a genome-wide scale. However, statistical methods are required to ensure that accurate identifications are made. A number of methods exist for identifying DE genes, but far fewer are available for identifying DE isoforms. When isoform DE is of interest, investigators often apply gene-level (count-based) methods directly to estimates of isoform counts. Doing so is not recommended. In short, estimating isoform expression is relatively straightforward for some groups of isoforms, but more challenging for others. This results in estimation uncertainty that varies across isoform groups. Count-based methods were not designed to accommodate this varying uncertainty and consequently application of them for isoform inference results in reduced power for some classes of isoforms and increased false discoveries for others.
Taking advantage of the merits of empirical Bayesian methods, researchers at the University of Wisconsin have developed EBSeq for identifying DE isoforms in an RNA-Seq experiment comparing two or more biological conditions. Results demonstrate substantially improved power and performance of EBSeq for identifying DE isoforms. EBSeq also proves to be a robust approach for identifying DE genes.
Availability: An R package containing examples and sample data sets is available at http://www.biostat.wisc.edu/˜kendzior/EBSEQ/
- Leng N, Dawson JA, Thomson JA, Ruotti V, Rissman AI, Smits BMG, Haag JD, Gould MN, Stewart RM, Kendziorski C. (2013) EBSeq: An empirical Bayes hierarchical model for inference in RNA-seq experiments. Bioinformatics [Epub ahead of print]. [abstract]
Incoming search terms:
- what does means throughput rna-seq seqanswers
- differential expression in rna-seq
- rna sequencing illumina next generation
- isoform based differential expression RNA Seq
- differential expression of rna seq
- identifying differentially expressed genes from rna-seq data matlab 2009a
- seqanswers difference transcript expression count variance
- RNAseq images
- rna-seq differential gene expression two samples
- testing for differential gene expression bayseq


.png)









