RNA-seq has shown huge potential for phylogenomic inferences in non-model organisms. However, error, incompleteness, and redundant assembled transcripts for each gene in de novo assembly of short reads cause noise in analyses and a large amount of missing data in the aligned matrix. To address these problems, we compare de novo assemblies of paired end 90 bp RNA-seq reads using Oases, Trinity, Trans-ABySS and SOAPdenovo-Trans to transcripts from genome annotation of the model plant Ricinus communis. By doing so we evaluate strategies for optimizing total gene coverage and minimizing assembly chimeras and redundancy.

Researchers at the University of Michigan found that the frequency and structure of chimeras vary dramatically among different software packages. The differences were largely due to the number of trans-self chimeras that contain repeats in the opposite direction. More than half of the total chimeras in Oases and Trinity were trans-self chimeras. Within each package, they found a trade-off between maximizing reference coverage and minimizing redundancy and chimera rate.

In order to reduce redundancy, they investigated three methods: Read more

Incoming search terms:

  • lokus no

The estimation of isoform abundances from RNA-Seq data requires a time-intensive step of mapping reads to either an assembled, or previously annotated transcriptome, followed by an optimization procedure for deconvolution of multi-mapping reads. These procedures are essential for downstream analysis such as differential expression. In cases where it is desirable to adjust the underlying annotation, for example upon the discovery of novel isoforms or errors in existing annotations, current pipelines must be rerun from scratch. This makes it difficult to update abundance estimates after re-annotation, or to explore the effect of changes in the transcriptome on analyses.

Researchers at UC Berkeley have developed a novel efficient algorithm for updating abundance estimates from RNA-Seq experiments upon re-annotation that does not require re-analysis of the entire dataset. Their approach is based on a fast partitioning algorithm for identifying transcripts whose abundances may depend on the added or deleted isoforms, and on a fast follow-up approach to re-estimating abundances for all transcripts. They demonstrate the effectiveness of our methods by showing how to synchronize RNA-Seq abundance estimates with the daily RefSeq incremental updates. Thus, they provide a practical approach to maintaining relevant databases of RNA-Seq derived abundance estimates even as annotations are being constantly revised.

ReXpress

Availability – ReXpress is freely available, together with source code, at http://bio.math.berkeley.edu/ReXpress/

Contact: lpachter@math.berkeley.edu

  • Roberts A, Schaeffer L, Pachter L. (2013) Updating RNA-Seq analyses after re-annotation. Bioinformatics [Epub ahead of print]. [abstract]

Incoming search terms:

  • www rna-seqblog com rexpress-for-updating-abundance-estimates-from-rna-seq-experiments-upon-re-annotation

MicroRNAs (miRNAs) are a class of small RNAs that post-transcriptionally regulate gene expression in animals and plants. The recent rapid advancement in miRNA biology, including high-throughput sequencing of small RNA libraries, inspired the development of a bioinformatics software, miRAuto, which predicts putative miRNAs in model plant genomes computationally. Furthermore, miRAuto enables users to identify miRNAs in non-model plant species whose genomes have yet to be fully sequenced. miRAuto analyzes the expression of the 5′-end position of mapped small RNAs in reference sequences to prevent the possibility of mRNA fragments being included as candidate miRNAs.

Researchers at Seoul National University validated the utility of miRAuto on a small RNA dataset, and the results were compared to other publicly available miRNA prediction programs. In conclusion, miRAuto is a fully automated user-friendly tool for predicting miRNAs from small RNA sequencing data in both model and non-model plant species.

miRAuto

Availability – miRAuto is available at http://nature.snu.ac.kr/software/miRAuto.htm .

  • Lee J, Kim DI, Park JH, Choi IY, Shin C. (2013) miRAuto: An automated user-friendly MicroRNA prediction tool utilizing plant small RNA sequencing data. Mol Cells 35(4), 342-7. [abstract]

Incoming search terms:

  • microrna mrna rna-seq
  • miRAuto: An automated user-friendly MicroRNA prediction tool utilizing plant small RNA sequencing data
  • mirna sequencing principle
  • academic library workflow sequence
  • mirna analysis tools
  • mirna sequencing data analysis tool
  • tool to align ENCODE bigwig to reference features

from Biostars by Botond Sipos

What is the rlsim package?

The rlsim package is a collection of tools for simulating RNA-seq library construction, aiming to reproduce the most important factors which are known to introduce significant biases in the currently used protocols: hexamer priming, PCR amplification and size selection. It allows for a systematic exploration of the effects of the individual biasing factors and their interactions on downstream applications by simulating data under a variety of parameter sets.

The implicit simulation model implemented in the main tool (rlsim) is inspired by actual library preparation protocols and it is more general than the models used by the bias correction methods hence it allows for a fair assessment of their performance.

Although the simulation model was kept as simple as possible in order to aid usability, it still has too many parameters to be inferred from data produced by standard RNA-seq experiments. However, simulating datasets with properties similar to specific datasets is often useful. To address this, the package provides a tool (effest) implementing simple approaches for estimating the parameters which can be recovered from standard RNA-seq data (GC-dependent amplification efficiencies, fragment size distribution, relative expression levels).

The latest release and the package source is available from the rlsim GitHub repository: https://github.com/sbotond/rlsim

Citing the rlsim package

An associated manuscript is in preparation, meanwhile the package should be cited as:

Botond Sipos, Tim Massingham and Nick Goldman (2013): rlsim – a package for simulating RNA-seq library preparation with parameter estimation [http://bit.ly/rlsim-doc].

Getting more help

Please consult the package documentation for more help on the tools and the technical background. Also feel free to ask questions on BioStar, I will monitor the rlsim tag.

Incoming search terms:

  • how to get simulated rna-seq data
  • www rna-seqblog com rlsim-a-package-for-simulating-rna-seq-library-preparation-with-parameter-estimation

The advent of next-generation sequencing, and in particular RNA-sequencing (RNA-Seq), technologies has expanded our knowledge of the transcriptional capacity of human and other animal, genomes. In particular, recent RNA-Seq studies have revealed that transcription is widespread across the mammalian genome, resulting in a large increase in the number of putative transcripts from both within, and intervening between, known protein-coding genes. Long transcripts that appear to lack protein-coding potential (long non-coding RNAs, lncRNAs) have been the focus of much recent research, in part owing to observations of their cell-type and developmental time-point restricted expression patterns. A variety of sequencing protocols are currently available for identifying lncRNAs including RNA polymerase II occupancy, chromatin state maps and – the focus of this review – deep RNA sequencing. In addition, there are numerous analytical methods available for mapping reads and assembling transcript models that predict the presence and structure of lncRNAs from RNA-Seq data. Here the authors review current methods for identifying lncRNAs using large-scale sequencing data from RNA-Seq experiments and highlight analytical considerations that are required when undertaking such projects.

lncRNA

  • Ilott NE, Ponting CP. (2013) Predicting long non-coding RNAs using RNA sequencing. Methods [Epub ahead of print]. [abstract]

Incoming search terms:

  • Predicting long non-coding RNAs using RNA sequencing
  • long noncoding rna RNA seq
  • hash based mapping rna
  • long noncoding rna rnaseq
  • standard pipeline rna-seq alignment
  • rnaseq normalized coverage wig
  • rna-seq tophat long noncoding
  • rna-seq sequence analysis pipeline
  • rna-seq non-coding rna annotation
  • RNA-Seq non-coding expression

In silico generated search for microRNAs (miRNAs) have been driven by methods compiling structural features of the miRNA precursor hairpin as well as to some degree combining this with analysis of RNA-seq profiles for which the miRNA typically leave the drosha/dicer fingerprint of 1-2 ~22nt blocks of reads corresponding to the mature and star miRNA.

In complement to the previous methods, researchers at the University of Copenhagen, Denmark present a study where they systematically exploit these pattern of read profiles. They created databases of 2,540 miRNA read profiles using short RNA-seq data from miRBase and 4,795 read profiles from ENCODE (after preprocessing). Of the 4,795 ENCODE profiles, 1,361 are annotated as noncoding RNAs (ncRNAs) and of which 285 are further annotated as miRNAs. Using \prog{deepBlockAlign} (dba), they align ENCODE ncRNA profiles against the miRBase profiles (cleaned for “self-matches”) and are able to separate ENCODE miRNAs from the other ncRNAs by a Matthews correlation coefficient of 0.8 and then obtain the area under the curve of 0.93. Using the derived separation dba score cut-off, they predict 523 novel miRNA candidates. Further analysis reveal that these are located in genomic regions with (UCSC) MAF block fragmentation and poor sequence conservation, which in part might explain why they have been overlooked in previous efforts.

The researchers further analyzed known miRNAs from human and mouse and found two distinct classes containing two block or $>2$ block respectively, where the latter class hold profiles having less well defined arrangement of reads. They further compared the read profiles specific for plant and animals respectively, in terms of both length and distribution of reads within the profiles. They observed that some read profiles were specific for the two kingdoms respectively.

Availability: All data as well as a server to search miRBase profiles by uploading a BED file is available at http://rth.dk/resources/dba/mirna.

  • Pundhir S, Gorodkin J. (2013) MicroRNA discovery by similarity search to a database of RNA-seq profiles. Frontiers in Bioinform & Comp Biol [Epub ahead of print]. [abstract]

Incoming search terms:

  • www rna-seqblog com microrna-discovery-by-similarity-search-to-a-database-of-rna-seq-profiles
  • rna-seq blog encode
  • encode rna seq guidelines
  • rna-seq database bam
  • rna seq blog mirna poll
  • database for rna seq results
  • Pundhir S Gorodkin J (2013) MicroRNA discovery by similarity search to a database of RNA-seq profiles Frontiers in Bioinform & Comp Biol [Epub ahead of print] [abstract]
  • rna seq mirna tophat small rnas
  • practise data set rna-seq
  • rna seq guidelines and practices encode

The RegulatoryGenomics website posts and updates a comprehensive list of tools for RNA-Seq analysis.

This is their current version.

Spliced-mappers

Method

Reference

Web-site

TopHap

(Trapnell et al. 2009)

http://tophat.cbcb.umd.edu/

MapSplice

(Wang et al. 2010)

http://www.netlab.uky.edu/p/bioinfo/MapSplice

SpliceMap

(Auger et al. 2010)

http://www.stanford.edu/group/wonglab/SpliceMap/

HMMSplicer

(Dimon et al. 2010)

http://derisilab.ucsf.edu/index.php?software=105

TrueSight

(Li et al. 2012b)

http://bioen-compbio.bioen.illinois.edu/TrueSight/

SOAPsplice

(Huang et al. 2011)

http://soap.genomics.org.cn/soapsplice.html

PASSion

(Zhang et al. 2012)

https://trac.nbic.nl/passion

PALMapper

(Jean et al. 2010)

http://galaxy.raetschlab.org/

SplitSeek

(Ameur et al. 2010)

http://solidsoftwaretools.com/gf/project/splitseek

Supersplat

(Bryant et al. 2010)

http://mocklerlab-tools.cgrb.oregonstate.edu/

SeqSaw

(Wang et al. 2011)

http://bioinfo.au.tsinghua.edu.cn/software/seqsaw

MapNext

(Bao et al. 2009)

http://evolution.sysu.edu.cn/english/software/mapnext.htm

STAR

(Dobin et al. 2012)

http://gingeraslab.cshl.edu/STAR/

GSNAP

(Wu et al. 2010)

http://research-pub.gene.com/gmap/

QPALMA

(De Bona et al. 2008)

http://www.raetschlab.org/suppl/qpalma

OSA

(Hu et al. 2012)

http://omicsoft.com/osa/

  Read more

Incoming search terms:

  • pathyway analysis for rna seq data
  • statistical methods for differential pathway activities
  • star splice junctions
  • solas rna analysis
  • scarf file rna
  • rna seq alternative splicing method
  • alternative splicing expression
  • MethodstostudyEvent/IsoformExpressionandAlternativeSplicingfromRNA-Seq|RNA-SeqBlog
  • junction map mrna deep sequencing
  • juncbase alternative splicing

RIP-seq has recently been developed to discover genome-wide RNA transcripts that interact with a protein or protein complex. RIP-seq is similar to both RNA-seq and ChIP-seq, but presents unique properties and challenges. Currently, no statistical tool is dedicated to RIP-seq analysis. Now, researchers at the University of Toronto, Canada have developed RIPSeeker, a free open-source Bioconductor/R package for de novo RIP peak predictions based on HMM.

RIPSeekerTo demonstrate the utility of the software package, they applied RIPSeeker and six other published programs to three independent RIP-seq datasets and two PAR-CLIP datasets corresponding to six distinct RNA-binding proteins. Based on receiver operating curves, RIPSeeker demonstrates superior sensitivity and specificity in discriminating high-confidence peaks that are consistently agreed on among a majority of the comparison methods, and dominated 9 of the 12 evaluations, averaging 80% area under the curve. The peaks from RIPSeeker are further confirmed based on their significant enrichment for biologically meaningful genomic elements, published sequence motifs and association with canonical transcripts known to interact with the proteins examined. While RIPSeeker is specifically tailored for RIP-seq data analysis, it also provides a suite of bioinformatics tools integrated within a self-contained software package comprehensively addressing issues ranging from post-alignments’ processing to visualization and annotation.

Availability – RIPSeeker  is freely available at – http://www.bioconductor.org/packages/2.12/bioc/html/RIPSeeker.html

Li Y, Zhao DY, Greenblatt JF, Zhang Z. (2013) RIPSeeker: a statistical package for identifying protein-associated transcripts from RIP-seq experiments. Nucleic Acids Res [Epub ahead of print]. [article]

Incoming search terms:

  • CHIPseq vs RNAseq
  • chip-seq blog
  • rip seq
  • applications of Advanced RNA-Seq and ChiP-Seq Data Application
  • rip_seq
  • RIPSeeker: a statistical package for identifying
  • ripseeker
  • rip-seq library
  • protocols for analyzing ripseq experiments
  • gene expression and ripseq

MicroRNAs (miRNAs) can group together along the human genome to form stable secondary structures made of several hairpins hosting miRNAs in their stems. The few known examples of such structures are all involved in cancer development. A large scale computational analysis of human chromosomes crossing sequence analysis and deep sequencing data revealed the presence of >400 structural clusters of miRNAs in the human genome. An a posteriori analysis validates predictions as bona fide miRNAs. A functional analysis of structural clusters position along the chromosomes co-localizes them with genes involved in several key cellular processes like immune systems, sensory systems, signal transduction and development. Immune systems diseases, infectious diseases and neurodegenerative diseases are characterized by genes that are especially well organized around structural clusters of miRNAs. Target genes functional analysis strongly supports a regulatory role of most predicted miRNAs and, notably, a strong involvement of predicted miRNAs in the regulation of cancer pathways. This analysis provides new fundamental insights on the genomic organization of miRNAs in human chromosomes.

MIReStruC

Availability: The program, called MIReStruC (standing for ‘miRNA Structural Cluster’), has been implemented in bash, C, Awk and Python. It is available at the address http://www.ihes.fr/∼carbone/data9/.

  • Mathelier A, Carbone A. (2013) Large scale chromosomal mapping of human microRNA structural clusters. Nucleic Acids Res [Epub ahead of print]. [article]

Incoming search terms:

  • mrna and mirna integration software rna-seq
  • microrna cufflink
  • Expressed sequence tag
  • problem analysis flowchart
  • mouse mirna rnaseq pipeline
  • microrna bolg
  • rna seq mir bowtie question snorna
  • is mirdeep suitable for pairend seq
  • mirna rna seq kadota
  • serum rna seq

Gene set analysis (GSA) is used to elucidate genome-wide data, in particular transcriptome data. A multitude of methods have been proposed for this step of the analysis, and many of them have been compared and evaluated. Unfortunately, there is no consolidated opinion regarding what methods should be preferred, and the variety of available GSA software and implementations pose a difficulty for the end-user who wants to try out different methods.

To address this, researchers at Chalmers University of Technology, Sweden have developed the R package Piano, that collects a range of GSA methods into the same system, for the benefit of the end-user. Further on they refine the GSA workflow by using modifications of the gene-level statistics. This enables them to divide the resulting gene set P-values into three classes, describing different aspects of gene expression directionality at gene set level.

Piano RNA-SeqThe researchers demonstrate their fully implemented workflow by investigating the impact of the individual components of GSA by using microarray and RNA-seq data. The results show that the evaluated methods are globally similar and the major separation correlates well with our defined directionality classes. As a consequence of this, they suggest to use a consensus scoring approach, based on multiple GSA runs. In combination with the directionality classes, this constitutes a more thorough basis for an enriched biological interpretation.

Availability – Piano is available, together with a user manual, for download at www.sysbio.se/piano.

  • Väremo L, Nielsen J, Nookaew I. (2013) Enriching the gene set analysis of genome-wide data by incorporating directionality of gene expression and combining statistical hypotheses and methods. Nucleic Acids Res [Epub ahead of print]. [article]

Incoming search terms:

  • piano RNA seq
  • biological interpretation
  • gene sets cummeRBund
  • query regarding rna-seq hit/read count
  • piano rna
  • patent gene set
  • gsa rna-seq
  • gene set analysis
  • gene function enrichment 2013
  • functional annotation and enrichment rna sequening

The whole-genome sequences of many non-model organisms have recently been determined. Using these genome sequences, next-generation sequencing based experiments such as RNA-Seq and ChIP-seq have been performed and comparisons of the experiments between related species have provided new knowledge about evolution and biological processes. Although these comparisons require transformation of the genome coordinates of the reads between the species, current software tools are not suitable to convert the massive numbers of reads to the corresponding coordinates of other species’ genomes.

RECOTNow, researchers at Ochanomizu University and the Tokyo Institute of Technology, Japan have developed a set of programs, called REad COordinate Transformer (RECOT), which is useful to compare RNA-seq, ChIP-seq and CLIP-seq sequences between closely-related species. RECOT can be used to transform the coordinates of short reads obtained from the genome of a query species being studied to that of a comparison target species after aligning the query and target gene/genome sequences. RECOT generates output in SAM format that can be viewed using recent genome browsers capable of displaying next-generation sequencing data. RECOT

They demonstrate the usefulness of RECOT in comparing ChIP-seq results between two closely-related fruit flies. The results indicate position changes of a transcription factor binding site caused sequence polymorphisms at the binding site.

Availability – RECOT is available at: http://sesejun.github.com/recot/

Izawa A, Sese J. (2013) A tool for the coordinate transformation of next-generation sequencing reads for comparative genomics and transcriptomics. Source Code Biol Med 8(1), 6. [Epub ahead of print]. [abstract]

Incoming search terms:

  • transcriptoma ppt
  • transcriptoma theme ppt
  • Transcriptomic analysis
  • RNA sequencing description
  • truseq rna
  • rna-seq tophat transcriptome
  • RNA sequencing process
  • powerpoint on transcriptomics
  • sage rna analysis
  • transriptomcs ppt

Study on long non-coding RNAs (lncRNAs) has been promoted by high-throughput RNA sequencing (RNA-Seq). However, it is still not trivial to identify lncRNAs from the RNA-Seq data and it remains a challenge to uncover their functions.

Now, a team led by researchers at China University of Mining and Technology have developed a computational pipeline for detecting novel lncRNAs from the RNA-Seq data. First, the genome-guided transcriptome reconstruction is used to generate initially assembled transcripts. The possible partial transcripts and artefacts are filtered according to the quantified expression level. After that, novel lncRNAs are detected by further filtering known transcripts and those with high protein coding potential, using a newly developed program called lncRScan.

They applied our pipeline to a mouse Klf1 knockout dataset, and discussed the plausible functions of the novel lncRNAs they detected by differential expression analysis. The team identified 308 novel lncRNA candidates, which have shorter transcript length, fewer exons, shorter putative open reading frame, compared with known protein-coding transcripts. Of the lncRNAs, 52 large intergenic ncRNAs (lincRNAs) show lower expression level than the protein-coding ones and 13 lncRNAs represent significant differential expression between the wild-type and Klf1 knockout conditions.

lncRScan

Their method can predict a set of novel lncRNAs from the RNA-Seq data. Some of the lncRNAs are showed differentially expressed between the wild-type and Klf1 knockout strains, suggested that those novel lncRNAs can be given high priority in further functional studies.

  • Sun L, Zhang Z, Bailey TL, Perkins AC, Tallack MR, Xu Z, Liu H. (2012) Prediction of novel long non-coding RNAs based on RNA-Seq data of mouse Klf1 knockout study. BMC Bioinformatics [Epub ahead of print]. [abstract]

Incoming search terms:

  • rna-seq non-coding rna review
  • follicular lymphoma gene ion torrent
  • review RNA-seq lncRNA
  • tuxedo pipeline rna seq tutorial
  • r command line for rna-seq data on windows
  • reviews about a misplaced lncrna causes brachydactyly in humans
  • RNA degradome ANALysis PHD THESIS
  • rna dgradome analysis THESIS
  • rna seq of mouse pipeline
  • rna sequencing analysis file formats work flow tuxedo

from Wikipedia, the free encyclopedia

(read more…)

Incoming search terms:

  • rna-seq gene regulatory network
  • RNA-seq plant stress experimental design
  • bioinformatics rna seq
  • bioinformatics tools rna-seq
  • rna-seq gene regulatory network clustering
  • chip-seq course
  • bioinformatics blog list
  • rna-seq classes usa
  • rna seq blog list tools
  • List of RNA-Seq bioinformatics tools

Next Page →

  • Social Networking Pages

    Linkedin Group

  • Follow Me on Pinterest
  • RSS SEQanswers – RNA Sequencing

    • Identifying small RNA sequence within whole genome sequence May 21, 2013
      Hi all, I want to know if there are any useful bioinformatic tool to find small RNA sequence within a whole bacteria genome. Thank you in... […]
      Inma
    • standard of clean data May 21, 2013
      Hi all I recently got my prokaryotes RNA-seq data report back. the standard filter steps of the raw data set by our local sequencing center is as... […]
      Pengfei Liu
    • Problem with cummeRbund diffData() May 20, 2013
      Hi all, I'm running Tophat/cufflinks/cuffdiff for differential gene expression and analysis with cummeRbund (v 2.0.0). I'm having an issue with... […]
      Enrique Zudaire
    • How to increase rowsize in heatmap? May 16, 2013
      Hi, I am a complete newbie to all things cummeRbund and am currently fighting with generating readable heatmaps. When I use ... […]
      Mags
    • novoalign mapping May 15, 2013
      Hi, I want to use novoalign to map reads - allowing up to 15 mismatches for 100 bp paired-end reads I am new to novoalign(went through the... […]
      abh
    • Design of expt across multiple lanes May 15, 2013
      Hi, I am performing an RNA-seq experiment to look at differential expression. The design is as follows: 2 populations x 3 biological... […]
      jbono
  • RSS Biostar – RNA-Seq

    • What are the best practices for SNP identification in RNA seq transcriptome data
      I have 20 RICE RNA seq tranascriptome data hiseq 2000 platform paired end reads. I aligned fasta reads with BWA and remove PCR duplicates with PICARD. Later I call SNP with samtools using various parameters. I would like to clarify what parameters should I used while alinging to reference rice genome for looking SNP location 100 bp upstream and 250 bp downst […]
    • How do TopHat options -g , --supress-hits, and Bowtie options interplay?
      Hi, I am currently using TopHat2 to map RNA-seq runs. I think there have been some changes pertaining the -g option. Does anyone know how it works now? I used to think that setting -g would look for n alignments for a given read, report them [if top-scoring] and discard those reads that had more than g [top scoring] alignments. Now, the description sounds mo […]
    • What happened to -k in TopHat for multiple-mapping reads?
      Selecting -g n in tophat does not discard reads mapping more than n, but instead only reports n alignments for those out all all their TOP scoring alignments. I think there used to be an option -k that would allow one to discard reads that topped x alignments -- whatever happened to that? I only see -g in the tophat 2 manual, no reporting options like before […]
    • Does tophat use the library-type information for mapping, or just for the XS flag?
      When I specify library-type to TopHat, i.e., first-strand, second-strand, unstranded, TopHat appends a value + or - to the XS:A tag, which is useful for subsequent analyses, such as annotation. However, does this information influence the "mappability" of reads, or is this unaffected? My guess is that the information will be considered for mapping […]
    • Purpose of Y-shaped adapters in Illumina Sequencing?
      Hi all, Y adapters different sequences to be annealed to the 5' and 3' ends of each molecule in a library. The arms of the Y are unique, and the middle part, connected to the DNA fragment, is complementary. What are the advantages of this? My take of this over having fully-complementary adapters (ADAPTER1 - - - - - ADAPTER1) is that: -Upon primer a […]
    • Cell Type composition in a tissue based on gene marker expression
      I am not sure if the following would even make sense.... Tissues are composed of composite cell types, and often there are studies such as microarray/NGS where we perform a collective sampling of cells from these tissues. Information about the composition (say percentage of cell type) is not taken into consideration. In some case (such as brain/cancer), ther […]