May
21
Voom! Precision weights unlock linear model analysis tools for RNA-Seq read counts
Filed Under Analysis Pipelines, Expression and Quantification | Leave a Comment
Voom: variance modelling at the observation-level
In the past few years, RNA-seq has emerged as a revolutionary new technology for expression profiling. RNA-seq expression data consists of read counts, and many recent publications have argued therefore that RNA-seq data should be analysed by statistical methods designed specifically for counts. Yet all the statistical methods developed for RNA-seq counts rely on approximations of various kinds.
This article revisits the idea of applying normal-based microarray-like statistical methods to RNA-seq read counts, with the idea that it is more important to model the mean-variance relationship correctly than it is to specify the exact probabilistic distribution of the counts. Log-counts per million are used as expression values. The voom method estimates the mean-variance relationship robustly and generates a precision weight for each individual normalized observation. The normalized log-counts per million and associated precision weights are then entered into the limma analysis pipeline, or indeed into any statistical pipeline for microarray data that is precision weight aware. This opens access for RNA-seq analysts to a large body of methodology developed for microarrays, allowing RNA-seq and microarray data to be analysed in closely comparable ways. The performance of voom and related limma-based pipelines is compared to that of edgeR, DESeq, baySeq, TSPM, PoissonSeq, and DSS. Simulation studies show that voom out-performs previous RNA-seq methods even when the data is generated according to the assumptions of the earlier methods. This is especially true when the sequence depths vary between RNA samples. Several data sets are also analysed to demonstrate how voom can handle heterogeneous data and complex experiments as well as facilitating pathway analysis and gene set testing methods.
Incoming search terms:
- www rna-seqblog com voom-precision-weights-unlock-linear-model-analysis-tools-for-rna-seq-read-counts
May
20
A bi-Poisson model for clustering gene expression profiles by RNA-seq
Filed Under Analysis Pipelines, Expression and Quantification | Leave a Comment
With the availability of gene expression data by RNA-seq, powerful statistical approaches for grouping similar gene expression profiles across different environments have become increasingly important. A team led by researchers at Penn State University describe and assess a computational model for clustering genes into distinct groups based on the pattern of gene expression in response to changing environment. The model capitalizes on the Poisson distribution to capture the count property of RNA-seq data. A two-stage hierarchical expectation-maximization (EM) algorithm is implemented to estimate an optimal number of groups and mean expression amounts of each group across two environments. A procedure is formulated to test whether and how a given group shows a plastic response to environmental changes. The impact of gene-environment interactions on the phenotypic plasticity of the organism can also be visualized and characterized. The model was used to analyse an RNA-seq dataset measured from two cell lines of breast cancer that respond differently to an anti-cancer drug, from which genes associated with the resistance and sensitivity of the cell lines are identified. They performed simulation studies to validate the statistical behaviour of the model. The model provides a useful tool for clustering gene expression data by RNA-seq, facilitating understanding of gene functions and networks.

- Wang N, Wang Y, Hao H, Wang L, Wang Z, Wang J, Wu R. (2013) A bi-Poisson model for clustering gene expression profiles by RNA-seq. Brief Bioinform [Epub ahead of print]. [abstract]
Incoming search terms:
- rna-seq identified a super-long intergenic transcript functioning in adipoge
- gene expression heart vertebrate
- illumina sequencing scriptseq
- star alignment r rnaseq
- www rna-seqblog com a-bi-poisson-model-for-clustering-gene-expression-profiles-by-rna-seq
May
20
Optimizing de novo assembly of short-read RNA-seq data for phylogenomics
Filed Under Analysis Pipelines, Other Tools | Leave a Comment
RNA-seq has shown huge potential for phylogenomic inferences in non-model organisms. However, error, incompleteness, and redundant assembled transcripts for each gene in de novo assembly of short reads cause noise in analyses and a large amount of missing data in the aligned matrix. To address these problems, we compare de novo assemblies of paired end 90 bp RNA-seq reads using Oases, Trinity, Trans-ABySS and SOAPdenovo-Trans to transcripts from genome annotation of the model plant Ricinus communis. By doing so we evaluate strategies for optimizing total gene coverage and minimizing assembly chimeras and redundancy.
Researchers at the University of Michigan found that the frequency and structure of chimeras vary dramatically among different software packages. The differences were largely due to the number of trans-self chimeras that contain repeats in the opposite direction. More than half of the total chimeras in Oases and Trinity were trans-self chimeras. Within each package, they found a trade-off between maximizing reference coverage and minimizing redundancy and chimera rate.
In order to reduce redundancy, they investigated three methods: Read more
Incoming search terms:
- lokus no
May
13
FlyBase RNA-Seq RPKM data calculations available for bulk download
Filed Under Analysis Pipelines | Leave a Comment
from flybase.org
FlyBase is extending its initial gene-level analyses of RNA-seq throughput data from modENCODE and others. The algorithm for RPKM (reads per kilobase per million mapped reads) has been refined, additional datasets have been analyzed, and these data are now available for bulk download.
In order to summarize this type of data at the gene level, it is necessary first to determine a single value for the expression level of each gene for each RNA-seq sample. RNA-seq coverage data are intersected with FlyBase exons, based on the gene model annotations of the current release, to calculate a single value reflecting average coverage per kb per gene. Each gene data point is then classified into one of eight expression level bins, and the graphical and text summaries were produced from the binned values. A more detailed explanation may be found at FBrf0221009.
Bulk data files can be accessed from the Precomputed Data Files page (menu: Files → Current Release). Look in the Genes section; the item line is ‘RNA-Seq RPKM values’. You can download the file directly by clicking here.
Simple and combinatorial queries of RPKM expression data can conducted using the ‘RNA-Seq Search’ option found under the ‘Expression’ tab in the Quick Search tool.
Incoming search terms:
- cryptic RNA-seq
- drosophila tophat
Apr
16
Next-generation RNA-sequencing (RNA-Seq) is rapidly outcompeting microarrays as the technology of choice for whole-transcriptome studies. However, the bioinformatics skills required for RNA-Seq data analysis often pose a significant hurdle for many biologists. Here, researchers from Utrecht University, The Netherlands put forward the concepts and considerations that are critical for RNA-Seq data analysis and provide a generic tutorial with example data that outlines the whole pipeline from next-generation sequencing output to quantification of differential gene expression.
- Van Verk MC, Hickman R, Pieterse CM, Van Wees SC. (2013) RNA-Seq: revelation of the messengers. Trends Plant Sci 18(4), 175-9. [article]
Incoming search terms:
- RNA sequencing assmbly pipeline
- www rna-seqblog com rna-seq-revelation-of-the-messengers
- genome annotation pipeline
- rna seq revelation of the messenger
- seq2hla tutorial
- analyzing pipelines
- long non-coding rna rna-seq r package
- profiling the messengers
- RIPSeeker tutorial
Apr
3
Predicting long non-coding RNAs with RNA-Seq
Filed Under Analysis Pipelines, Other Tools | Leave a Comment
The advent of next-generation sequencing, and in particular RNA-sequencing (RNA-Seq), technologies has expanded our knowledge of the transcriptional capacity of human and other animal, genomes. In particular, recent RNA-Seq studies have revealed that transcription is widespread across the mammalian genome, resulting in a large increase in the number of putative transcripts from both within, and intervening between, known protein-coding genes. Long transcripts that appear to lack protein-coding potential (long non-coding RNAs, lncRNAs) have been the focus of much recent research, in part owing to observations of their cell-type and developmental time-point restricted expression patterns. A variety of sequencing protocols are currently available for identifying lncRNAs including RNA polymerase II occupancy, chromatin state maps and – the focus of this review – deep RNA sequencing. In addition, there are numerous analytical methods available for mapping reads and assembling transcript models that predict the presence and structure of lncRNAs from RNA-Seq data. Here the authors review current methods for identifying lncRNAs using large-scale sequencing data from RNA-Seq experiments and highlight analytical considerations that are required when undertaking such projects.
- Ilott NE, Ponting CP. (2013) Predicting long non-coding RNAs using RNA sequencing. Methods [Epub ahead of print]. [abstract]
Incoming search terms:
- Predicting long non-coding RNAs using RNA sequencing
- long noncoding rna RNA seq
- rna seq long noncoding rna
- hash based mapping rna
- moving window long-non coding rna
- standard pipeline rna-seq alignment
- rnaseq normalized coverage wig
- rna-seq tophat long noncoding
- rna-seq sequence analysis pipeline
- rna-seq non-coding rna annotation
Mar
25
Interpretation, Stratification and Evidence for Sequence Variants Affecting mRNA Splicing in Complete Human Genome Sequences
Filed Under Analysis Pipelines, Splicing and Junction Mapping | Leave a Comment
Information theory-based methods have been shown to be sensitive and specific for predicting and quantifying the effects of non-coding mutations in Mendelian diseases. The University of Western Ontario, Canada have developed the Shannon pipeline software for genome-scale mutation analysis and provide evidence that the software predicts variants affecting mRNA splicing. Individual information contents (in bits) of reference and variant splice sites are compared and significant differences are annotated and prioritized. The software has been implemented for CLC-Bio Genomics platform. Annotation indicates the context of novel mutations as well as common and rare SNPs with splicing effects. Potential natural and cryptic mRNA splicing variants are identified, and null mutations are distinguished from leaky mutations. Mutations and rare SNPs were predicted in genomes of three cancer cell lines (U2OS, U251 and A431), which were supported by expression analyses. After filtering, tractable numbers of potentially deleterious variants are predicted by the software, suitable for further laboratory investigation. In these cell lines, novel functional variants comprised 6-17 inactivating mutations, 1-5 leaky mutations and 6-13 cryptic splicing mutations. Predicted effects were validated by RNA-seq analysis of the three aforementioned cancer cell lines, and expression microarray analysis of SNPs in HapMap cell lines.

- Shirley BC, Mucaki EJ, Whitehead T, Costea PI, Akan P, Rogan PK. (2013) Interpretation, Stratification and Evidence for Sequence Variants Affecting mRNA Splicing in Complete Human Genome Sequences. Genomics Proteomics Bioinformatics [Epub ahead of print]. [abstract]
Incoming search terms:
- A computational pipeline for RIP-seq analyses
- bow tie analyse graaf
- rna splicing evidence
- rna-seq analysis pipeline tools
- splice site annotation
Mar
18
Methods to study Event/Isoform Expression and Alternative Splicing from RNA-Seq
Filed Under Analysis Pipelines, Expression and Quantification, Other Tools, Pathway Analysis, Splicing and Junction Mapping, Transcriptome Assembly Tools, Unspliced Mapping Tools | Leave a Comment
The RegulatoryGenomics website posts and updates a comprehensive list of tools for RNA-Seq analysis.
This is their current version.
|
Spliced-mappers |
||
|
Method |
Reference |
Web-site |
|
TopHap |
(Trapnell et al. 2009) |
|
|
MapSplice |
(Wang et al. 2010) |
|
|
SpliceMap |
(Auger et al. 2010) |
|
|
HMMSplicer |
(Dimon et al. 2010) |
|
|
TrueSight |
(Li et al. 2012b) |
|
|
SOAPsplice |
(Huang et al. 2011) |
|
|
PASSion |
(Zhang et al. 2012) |
|
|
PALMapper |
(Jean et al. 2010) |
|
|
SplitSeek |
(Ameur et al. 2010) |
|
|
Supersplat |
(Bryant et al. 2010) |
|
|
SeqSaw |
(Wang et al. 2011) |
http://bioinfo.au.tsinghua.edu.cn/software/seqsaw |
|
MapNext |
(Bao et al. 2009) |
|
|
STAR |
(Dobin et al. 2012) |
|
|
GSNAP |
(Wu et al. 2010) |
|
|
QPALMA |
(De Bona et al. 2008) |
|
|
OSA |
(Hu et al. 2012) |
|
| Read more | ||
Incoming search terms:
- pathyway analysis for rna seq data
- statistical methods for differential pathway activities
- star splice junctions
- solas rna analysis
- scarf file rna
- rna seq alternative splicing method
- alternative splicing expression
- MethodstostudyEvent/IsoformExpressionandAlternativeSplicingfromRNA-Seq|RNA-SeqBlog
- junction map mrna deep sequencing
- juncbase alternative splicing
Mar
14
Tutorial – RNA-Seq output to quantification of differential gene expression
Filed Under Analysis Pipelines, Data Analysis, Data Analysis | Leave a Comment
Next-generation RNA-sequencing (RNA-Seq) is rapidly outcompeting microarrays as the technology of choice for whole-transcriptome studies. However, the bioinformatics skills required for RNA-Seq data analysis often pose a significant hurdle for many biologists. Here, researchers at Utrecht University, The Netherlands put forward the concepts and considerations that are critical for RNA-Seq data analysis and provide a generic tutorial with example data that outlines the whole pipeline from next-generation sequencing output to quantification of differential gene expression.
Van Verk MC, Hickman R, Pieterse CM, Van Wees SC. (2013) RNA-Seq: revelation of the messengers. Trends Plant Sci [Epub ahead of print]. [abstract]
Incoming search terms:
- rna-seq revelation of the messengers
- rna-seq data pipeline
- Differential Gene Expression heatmap from RNA-Seq data using cummeRbund
- rna-seq: revelation of the messengers
- ngs data analysis pipeline ion torrent
- sigma semi-degenerate primer
- spike-in normalization rna-seq
- shrimp mirna ion proton
- miRNA-seq analysis pipeline
- mirna pipeline
Mar
13
RIPSeeker – a statistical package for identifying protein-associated transcripts from RIP-seq experiments
Filed Under Analysis Pipelines, Other Tools | Leave a Comment
RIP-seq has recently been developed to discover genome-wide RNA transcripts that interact with a protein or protein complex. RIP-seq is similar to both RNA-seq and ChIP-seq, but presents unique properties and challenges. Currently, no statistical tool is dedicated to RIP-seq analysis. Now, researchers at the University of Toronto, Canada have developed RIPSeeker, a free open-source Bioconductor/R package for de novo RIP peak predictions based on HMM.
To demonstrate the utility of the software package, they applied RIPSeeker and six other published programs to three independent RIP-seq datasets and two PAR-CLIP datasets corresponding to six distinct RNA-binding proteins. Based on receiver operating curves, RIPSeeker demonstrates superior sensitivity and specificity in discriminating high-confidence peaks that are consistently agreed on among a majority of the comparison methods, and dominated 9 of the 12 evaluations, averaging 80% area under the curve. The peaks from RIPSeeker are further confirmed based on their significant enrichment for biologically meaningful genomic elements, published sequence motifs and association with canonical transcripts known to interact with the proteins examined. While RIPSeeker is specifically tailored for RIP-seq data analysis, it also provides a suite of bioinformatics tools integrated within a self-contained software package comprehensively addressing issues ranging from post-alignments’ processing to visualization and annotation.
Availability – RIPSeeker is freely available at – http://www.bioconductor.org/packages/2.12/bioc/html/RIPSeeker.html
Li Y, Zhao DY, Greenblatt JF, Zhang Z. (2013) RIPSeeker: a statistical package for identifying protein-associated transcripts from RIP-seq experiments. Nucleic Acids Res [Epub ahead of print]. [article]
Incoming search terms:
- CHIPseq vs RNAseq
- chip-seq blog
- rip seq
- applications of Advanced RNA-Seq and ChiP-Seq Data Application
- rip_seq
- RIPSeeker: a statistical package for identifying
- ripseeker
- rip-seq library
- protocols for analyzing ripseq experiments
- gene expression and ripseq
Feb
19
lncRScan – Prediction of novel long non-coding RNAs based on RNA-Seq data
Filed Under Analysis Pipelines, Other Tools | Leave a Comment
Study on long non-coding RNAs (lncRNAs) has been promoted by high-throughput RNA sequencing (RNA-Seq). However, it is still not trivial to identify lncRNAs from the RNA-Seq data and it remains a challenge to uncover their functions.
Now, a team led by researchers at China University of Mining and Technology have developed a computational pipeline for detecting novel lncRNAs from the RNA-Seq data. First, the genome-guided transcriptome reconstruction is used to generate initially assembled transcripts. The possible partial transcripts and artefacts are filtered according to the quantified expression level. After that, novel lncRNAs are detected by further filtering known transcripts and those with high protein coding potential, using a newly developed program called lncRScan.
They applied our pipeline to a mouse Klf1 knockout dataset, and discussed the plausible functions of the novel lncRNAs they detected by differential expression analysis. The team identified 308 novel lncRNA candidates, which have shorter transcript length, fewer exons, shorter putative open reading frame, compared with known protein-coding transcripts. Of the lncRNAs, 52 large intergenic ncRNAs (lincRNAs) show lower expression level than the protein-coding ones and 13 lncRNAs represent significant differential expression between the wild-type and Klf1 knockout conditions.
Their method can predict a set of novel lncRNAs from the RNA-Seq data. Some of the lncRNAs are showed differentially expressed between the wild-type and Klf1 knockout strains, suggested that those novel lncRNAs can be given high priority in further functional studies.
- Sun L, Zhang Z, Bailey TL, Perkins AC, Tallack MR, Xu Z, Liu H. (2012) Prediction of novel long non-coding RNAs based on RNA-Seq data of mouse Klf1 knockout study. BMC Bioinformatics [Epub ahead of print]. [abstract]
Incoming search terms:
- rna-seq non-coding rna review
- follicular lymphoma gene ion torrent
- review RNA-seq lncRNA
- tuxedo pipeline rna seq tutorial
- r command line for rna-seq data on windows
- reviews about a misplaced lncrna causes brachydactyly in humans
- RNA degradome ANALysis PHD THESIS
- rna dgradome analysis THESIS
- rna seq of mouse pipeline
- rna sequencing analysis file formats work flow tuxedo
Feb
15
iSeeRNA – identification of long intergenic non-coding RNA transcripts from RNA-Seq data
Filed Under Analysis Pipelines, Other Tools | Leave a Comment
Long intergenic non-coding RNAs (lincRNAs) are emerging as a novel class of non-coding RNAs and potent gene regulators. High-throughput RNA-sequencing combined with de novo assembly promises quantity discovery of novel transcripts. However, the identification of lincRNAs from thousands of assembled transcripts is still challenging due to the difficulties of separating them from protein coding transcripts (PCTs).
A team of scientists at The Chinese University of Hong Kong have developed iSeeRNA, a support vector machine (SVM)-based classifier for the identification of lincRNAs. iSeeRNA shows better performance compared to other software.
iSeeRNA demonstrates high prediction accuracy and runs several magnitudes faster than other similar programs. It can be integrated into the transcriptome data analysis pipelines or run as a web server, thus offering a valuable tool for lincRNA study.

Availability – iSeeRNA is available as a user-friendly web server with free accessibility at http://www.myogenesisdb.org/iSeeRNA
- Sun K, Chen X, Jiang P, Song X, Wang H, Sun H. (2013) iSeeRNA: identification of long intergenic non-coding RNA transcripts from transcriptome sequencing data. BMC Genomics 14(supp 2). [article]
Incoming search terms:
- lnc RNA NGS
- difference between rna seq and exome sequencing
- long non coding rna svm cancer
- transcriptome blog
- long non coding rna protocol
- long noncoding RNA 2013
- long noncoding rna seq
- long noncoding rna enhancer
- linc noncdoing rna analysis
- isee rna
Feb
13
Softberry Releases 80 Free Bioinformatics Programs for Immediate Download by Academic Users
Filed Under Analysis Pipelines, Data Analysis, Press Release | Leave a Comment
MOUNT KISCO, N.Y.–(BUSINESS WIRE)–Softberry, Inc. announces release of a comprehensive set of biomedical research-oriented software applications for academic users to install and run locally on a limited basis.
The programs, already cited in thousands of scientific publications, are available for Linux and Mac OS platforms and focus primarily on genomic and proteomic research. They include tools for analysis of next generation sequencing data: Accurate spliced alignment of RNA-Seq data to a reference genome (ReadsMap), de novo assembly of transcriptome reads into RNA transcripts (TransSeq), genome assembly (OligoZip) and a software package for SNP analysis (SNP-Toolbox). Read more
Incoming search terms:
- readsmapper software (roche
- softberry
- trinity software rna-seq download free


.png)











