DNA sequencing technology is becoming more accessible to a variety of researchers as costs continue to decline. As researchers begin to sequence novel transcriptomes, most of these datasets lack a reference genome and will have to rely on de novo assemblers. Making comparisons across assemblies can be difficult: each program has its strengths and weaknesses and no tool exists to comparatively evaluate these datasets.

Now, a team led by researchers at the University of Rhode Island have developed software in R, called Sequence Comparative Analysis using Networks (SCAN) to perform statistical comparisons between distinct assemblies. SCAN uses a reference dataset to identify the most accurate de novo assembly and the ‘good’ transcripts in the user’s data. They tested SCAN on 3 publicly available transcriptomes, each assembled using 3 assembly programs. Moreover, they sequenced the transcriptome of the oomycete Achlya hypogyna and compared de novo assemblies from Velvet, ABySS, and the CLC Genomics Workbench assembly algorithms. One thousand one hundred and twenty eight (1,128) of the CLC transcripts were statistically similar to the reference, compared to 49 of the Velvet transcripts and 937 of the ABySS transcripts. SCAN’s strength is providing statistical support for transcript assemblies in a biological context. However, SCAN is designed to compare distinct node sets in networks, therefore it can also easily be extended to perform statistical comparisons on any network graph regardless of what the nodes represent.

SCAN

Availability – Two versions of SCAN were developed: “SCAN” and “SCAN stringent,” that can run either in single or multiprocessor nodes, and are available from http://evol-net.fr .

  • Misner I, Bicep C, Lopez P, Halary S, Bapteste E, Lane CE. (2013) Sequence Comparative Analysis using Networks (SCAN): software for evaluating de novo transcript assembly from next generation sequencing. Mol Biol Evol [Epub ahead of print]. [abstract]

Incoming search terms:

  • sequence comparative analysis using networks (scan) – software for evaluating de novo transcript assembly from rna-seq data
  • software for evaluating scanned im
  • liang liang@uky edu
  • nugen illumina indexes comparison
  • rna-seq r package plot
  • www rna-seqblog com sequence-comparative-analysis-using-networks-scan-software-for-evaluating-de-novo-transcript-assembly-from-rna-seq-data

NP-hardIsoform reconstruction is a key step in RNA-Seq analysis. Tools such as CEM, iReckon, NSMAP, and MonteBello use maximum likelihood for isoform reconstruction. The maximum likelihood approach has been observed to be computationally expensive. Here, researchers from Tsinghua University, China show that isoform reconstruction using short RNA-Seq reads by maximum likelihood is NP-hard.

  • Li T, Jiang R, Zhang X. (203) Isoform reconstruction using short RNA-Seq reads by maximum likelihood is NP-hard. arXiv:1305.0916 [q-bio.QM]. [article]

Incoming search terms:

  • www rna-seqblog com isoform-reconstruction-using-short-rna-seq-reads-by-maximum-likelihood-is-np-hard
  • cykao@csie ntu edu tw

Institute of Genetic Medicine at Johns Hopkins UniversityTopHat, a popular spliced aligner for RNA-seq experiments has now been succeeded by TopHat2, which incorporates many significant enhancements to TopHat. TopHat2 can align reads of various lengths produced by the latest sequencing technologies, while allowing for variable-length indels with respect to the reference genome. In addition to de novo spliced alignment, TopHat2 can align reads across fusion breaks, which occur after genomic translocations. TopHat2 combines the ability to discover novel splice sites with direct mapping to known transcripts, producing sensitive and accurate alignments, even for highly repetitive genomes or in the presence of pseudogenes.

Availability: TopHat2 is available at http://ccb.jhu.edu/software/tophat.

  • Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. (2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14(4), R36. [Epub ahead of print]. [abstract]

Incoming search terms:

  • tophat
  • trinity rna seq manual
  • tophat sequence analysis
  • tophat2 pipeline
  • fusionmap sequencing
  • tophat junction then fusion
  • Tophat for Solid
  • tophat parameters mammalian transcriptomes
  • tophat ppt
  • tophat Preparing reads failed

Sequencing of RNAs (RNA-Seq) has revolutionized the field of transcriptomics, but the reads obtained often contain errors. Read error correction can have a large impact on our ability to accurately assemble transcripts. This is especially true for de novo transcriptome analysis, where a reference genome is not available. Current read error correction methods, developed for DNA sequence data, cannot handle the overlapping effects of non-uniform abundance, polymorphisms and alternative splicing.

Now, researchers at the Carnegie Mellon Universityhave developed SEquencing Error CorrEction in Rna-seq data (SEECER), a hidden Markov Model (HMM)-based method, which is the first to successfully address these problems. SEECER efficiently learns hundreds of thousands of HMMs and uses these to correct sequencing errors. Using human RNA-Seq data, they show that SEECER greatly improves on previous methods in terms of quality of read alignment to the genome and assembly accuracy. To illustrate the usefulness of SEECER for de novo transcriptome studies, they generated new RNA-Seq data to study the development of the sea cucumber Parastichopus parvimensis. Their corrected assembled transcripts shed new light on two important stages in sea cucumber development. Comparison of the assembled transcripts to known transcripts in other species has also revealed novel transcripts that are unique to sea cucumber, some of which we have experimentally validated.

Supporting website: http://sb.cs.cmu.edu/seecer/.

Le HS, Schulz MH, McCauley BM, Hinman VF, Bar-Joseph Z. (2013) Probabilistic error correction for RNA sequencing. Nucleic Acids Res. 2013 Apr 4. [Epub ahead of print]. [article]

Incoming search terms:

  • rna sequencing tutorial using nbic
  • abyss overlapping contigs
  • seecer
  • map rna-seq data to transcriptome
  • how do you make a gtf file for spike in reads
  • chimera transcriptome assembly gmap
  • seecer sequencing error
  • seed based algorithm
  • tender transcriptome sequencing april 2013
  • transcriptomics rna-seq bacteria rfam

Whole transcriptome RNA-Seq has emerged as a powerful tool in transcriptomics, enabling genome-wide quantitative analysis of gene expression and qualitative identification of novel coding or non-coding RNA species through transcriptome reassembly. Common protocols for preparation of RNA-Seq libraries include an RNA fragmentation step for which several RNA sizing techniques are commercially available. To date, there is no global information about their putative bias on transcriptome analysis.

Here researchers at the Université Pierre et Marie Curie compared the effects of RNase III- and zinc-mediated RNA fragmentation on transcript expression measurement and transcriptome reassembly in the budding yeast Saccharomyces cerevisiae. They observed that RNA cleavage by RNase III is heterogeneous along transcripts with a striking decrease of autocorrelation between adjacent nucleotides along the transcriptome. This had little impact on mRNA expression measurement, but specific classes of transcripts such as abundant non-coding RNAs were underrepresented in the libraries constructed using RNase III. Furthermore, zinc-mediated fragmentation allows proper reassembly of more transcripts, with more precise 5′ and 3′ ends. Together, these results show that transcriptome reassembly from RNA-Seq data is very sensitive to the RNA fragmentation technique, and that zinc-mediated fragmentation provides more robust and accurate transcript identification than cleavage by RNase III.

zinc finger RNA-Seq

  • Wery M, Descrimes M, Thermes C, Gautheret D, Morillon A. (2013) Zinc-mediated RNA fragmentation allows robust transcript reassembly upon whole transcriptome RNA-Seq. Methods [Epub ahead of print]. [abstract]

Incoming search terms:

  • Zinc-mediated RNA fragmentation allows robust transcript reassembly upon whole transcriptome RNA-Seq
  • methods to study event/isoform expression and alternative splicing from rna-seq pdf
  • ercc rna spike-in control mit broad institute
  • zinc mediated fragmentation
  • truseq small rna library prep
  • RNAseq and adapter
  • rna fragmentation zn mediated mechansm
  • ngs rna-seq kit zinc rna fragmentation
  • library complexity whole transcriptome analysis
  • zinc-mediated rna fragmentation allows robust transcript reassembly upon whole transcriptome rna-seq pdf

De novo transcriptome assemblies of RNA-Seq data are important for genomics applications of unsequenced organisms. Due to the complexity and often incomplete representation of transcripts in sequencing libraries, the assembly of high-quality transcriptomes can be challenging. However, with the rapidly growing number of sequenced genomes it is now feasible to improve RNA-Seq assemblies by guiding them with genomic sequences.

This study introduces BRANCH, an algorithm designed for improving de novo transcriptome assemblies by utilizing genomic information that can be partial or complete genome sequences from the same or a related organism. Its input includes assembled RNA reads (transfrags), genomic sequences (e.g. contigs) and the RNA reads themselves. It uses a customized version of BLAT to align the transfrags and RNA reads to the genomic sequences. After identifying exons from the alignments, it defines a directed acyclic graph and maps the transfrags to paths on the graph. It then joins and extends the transfrags by applying an algorithm that solves a combinatorial optimization problem, called the Minimum weight Minimum Path Cover with given Paths (MMPCP). In performance tests on real data from C. elegans and S. cerevisiae, assisted by genomic contigs from the same species, BRANCH improved the sensitivity and precision of transfrags generated by Velvet/Oases or Trinity by 5.1-56.7% and 0.3-10.5%, respectively. These improvements added 3.8-74.1% complete transcripts and 8.3-33.8% proteins to the initial assembly. Similar improvements were achieved when guiding the BRANCH processing of a transcriptome assembly from a more complex organism (mouse) with genomic sequences from a related species (rat).

BRANCH

Availability: The BRANCH software can be downloaded for free from this site: http://manuals.bioinformatics.ucr.edu/home/branch.

Contact: thomas.girke@ucr.edu

  • Bao E, Jiang T, Girke T.(2013) BRANCH: boosting RNA-Seq assemblies with partial or related genomic sequences. Bioinformatics [Epub ahead of print]. [abstract]

Incoming search terms:

  • disadvantages of using abyss sequence assembly manual
  • structural variation rna seq
  • rnaseq assembler comparison
  • rna-seq splicing graph
  • transcript assembly software
  • transcriptome and assembly
  • combining assemblies from different software
  • combined transcriptome assemblies
  • rnaseq best assembling
  • assembly of rna

The RegulatoryGenomics website posts and updates a comprehensive list of tools for RNA-Seq analysis.

This is their current version.

Spliced-mappers

Method

Reference

Web-site

TopHap

(Trapnell et al. 2009)

http://tophat.cbcb.umd.edu/

MapSplice

(Wang et al. 2010)

http://www.netlab.uky.edu/p/bioinfo/MapSplice

SpliceMap

(Auger et al. 2010)

http://www.stanford.edu/group/wonglab/SpliceMap/

HMMSplicer

(Dimon et al. 2010)

http://derisilab.ucsf.edu/index.php?software=105

TrueSight

(Li et al. 2012b)

http://bioen-compbio.bioen.illinois.edu/TrueSight/

SOAPsplice

(Huang et al. 2011)

http://soap.genomics.org.cn/soapsplice.html

PASSion

(Zhang et al. 2012)

https://trac.nbic.nl/passion

PALMapper

(Jean et al. 2010)

http://galaxy.raetschlab.org/

SplitSeek

(Ameur et al. 2010)

http://solidsoftwaretools.com/gf/project/splitseek

Supersplat

(Bryant et al. 2010)

http://mocklerlab-tools.cgrb.oregonstate.edu/

SeqSaw

(Wang et al. 2011)

http://bioinfo.au.tsinghua.edu.cn/software/seqsaw

MapNext

(Bao et al. 2009)

http://evolution.sysu.edu.cn/english/software/mapnext.htm

STAR

(Dobin et al. 2012)

http://gingeraslab.cshl.edu/STAR/

GSNAP

(Wu et al. 2010)

http://research-pub.gene.com/gmap/

QPALMA

(De Bona et al. 2008)

http://www.raetschlab.org/suppl/qpalma

OSA

(Hu et al. 2012)

http://omicsoft.com/osa/

  Read more

Incoming search terms:

  • pathyway analysis for rna seq data
  • statistical methods for differential pathway activities
  • star splice junctions
  • solas rna analysis
  • scarf file rna
  • rna seq alternative splicing method
  • alternative splicing expression
  • MethodstostudyEvent/IsoformExpressionandAlternativeSplicingfromRNA-Seq|RNA-SeqBlog
  • junction map mrna deep sequencing
  • juncbase alternative splicing

High accuracy de novo assembly of the short sequencing reads from RNA-Seq technology is very challenging. A team led by researchers at Asia University, Taiwan have developed a de novo assembly algorithm, EBARDenovo, which stands for Extension, Bridging And Repeat-sensing Denovo. This algorithm employs an efficient chimera-detection function to abrogate the effect of aberrant chimeric reads in RNA-Seq data.

EBARDenovo resolves the complications of RNA-Seq assembly arising from sequencing errors, repetitive sequences and aberrant chimeric amplicons. In a series of assembly experiments, this algorithm was found to be the most accurate among the examined programs including de Bruijn graph assemblers, Trinity and Oases.

EBARDenovo

AVAILABILITY: EBARDenovo is freely available at http://ebardenovo.sourceforge.net/

CONTACT: chu@live.asia.edu.tw; postergrey@gmail.com; cykao@csie.ntu.edu.tw

Chu HT, Hsiao WW, Chen JC, Yeh TJ, Tsai MH, Lin H, Liu YW, Lee SA, Chen CC, Tsao TT, Kao CY. (2013) EBARDenovo: Highly accurate de novo assembly of RNA-Seq with efficient chimera-detection. Bioinformatics [Epub ahead of print]. [abstract]

Incoming search terms:

  • rna seq de novo transcriptome assembly or snp calling
  • rna assembly
  • rnaseq de novo assembly
  • trans abyss algorithm
  • how to make gtf file from rna-seq reads in non-model organisms
  • gtf de novo assembly
  • denovo assembly
  • rnaseq assembly
  • www rna-seqblog com ebardenovo-highly-accurate-de-novo-assembly-of-rna-seq-with-efficient-chimera-detection
  • RNA-seq de novo

Transcriptome reconstruction is an important application of RNA-Seq, providing critical information for further analysis of transcriptome. Although RNA-Seq offers the potential to identify the whole picture of transcriptome, it still presents special challenges. To handle these difficulties and reconstruct transcriptome as completely as possible, current computational approaches mainly employ two strategies: de novo assembly and genome-guided assembly.

Researchers at the Center for Bioinformatics and Computational Biology, East China Normal University, Shanghai chose five representative assemblers belonging to the two classes respectively, then investigated and compared their algorithm features in theory and real performances in practice.

The researchers found that all the methods can be reduced to graph reduction problems, yet they have different conceptual and practical implementations, thus each assembly method has its specific advantages and disadvantages, performing worse than others in certain aspects while outperforming others in anther aspects at the same time. Finally they merged assemblies of the five assemblers and obtained a much better assembly. Additionally they evaluated an assembler using genome-guided de novo assembly approach, and achieved good performance. Based on these results, they suggest that to obtain a comprehensive set of recovered transcripts, it is better to use a combination of de novo assembly and genome-guided assembly.

  • Lu B, Zeng Z, Shi T. (2013) Comparative study of de novo assembly and genome-guided assembly strategies for transcriptome reconstruction based on RNA-Seq. Sci China Life Sci 56(2):143-55. [abstract]

Incoming search terms:

  • comparative study of de novo assembly and genome-guided assembly strategies for transcriptome reconstruction based on rna-seq
  • reference transcriptome assembly
  • trinity denovo assembler manual
  • transcriptome assembly software
  • transcriptome assembly tool
  • genome reconstruction tool
  • oases: robust de novo rna-seq assembly across the dynamic range of expression levels
  • transcriptome assemblers
  • mouse rna-seq de novo assembly
  • merging de novo transcriptome assemblies

Bioinformatics has published a Next-Gen Sequencing “Virtual Issue” covering all the sequencing tools that appeared in the journal.  We have listed those described as applicable to RNA-Seq.

Statistical Inferences for Isoform Expression in RNA-Seq.
Hui Jiang and Wing Wong
Bioinformatics (2009) 25: 1026–1032 Full Text

A toolkit for analysing large-scale plant small RNA datasets
Simon Moxon et al.
Bioinformatics (2008) 24: 2252-2253 Full Text

TopHat: discovering splice junctions with RNA-Seq
Cole Trapnell et al.
Bioinformatics (2009) 25: 1105–1111 Full Text

Read more

Incoming search terms:

  • top hat rna-seq
  • ion torrent tophat2 mapping error
  • cummerbund rna
  • how to use tophat
  • edger tutorial
  • tophat fusion
  • edger
  • edgeR的使用 RNA
  • tophat rnaseq
  • tophat mapping

RNA sequencing (RNA-Seq) has become a major tool for biomedical research. A key step in analyzing RNA-seq data is to infer the origin of short reads in the source genome, and for this purpose, many read alignment/mapping software programs have been developed. Usually, the majority of mappable reads can be mapped to one unambiguous genomic location, and these reads are called unique reads. However, a considerable proportion of mappable reads can be aligned to more than one genomic location with the same or similar fidelities, and they are called “multireads”. Allocating these multireads is challenging but critical for interpreting RNA-seq data.

In order to serve a greater biological community, researchers at the University of Texas MD Anderson Cancer Center have implemented a Bayesian stochastic model that allocates multireads in a stand-alone, efficient, and user-friendly software package, BM-Map. BM-Map takes SAM (Sequence Alignment/Map), the most popular read alignment format, as the standard input; then based on the Bayesian model, it calculates mapping probabilities of multireads for competing genomic loci; and BM-Map generates the output by adding mapping probabilities to the original SAM file so that users can easily perform downstream analyses.

RNA-Seq

The program is available in three common operating systems, Linux, Mac and PC at: http://bioinformatics.mdanderson.org/main/BM-Map, which includes free downloads, detailed tutorials and illustration examples.

  • Yuan Y, Norris C, Xu Y, Tsui KW, Ji Y, Liang H. (2012) BM-Map: an efficient software package for accurately allocating multireads of RNA-sequencing data. BMC Genomics 13 Suppl 8:S9. [article]

Incoming search terms:

  • rna-seq multi-reads mapping
  • Targeted RNA
  • multireads cufflinks

A team led by researchers at Georgia State University now propose a novel statistical genome-guided method called “Transcriptome Reconstruction using Integer Programing” (TRIP) that incorporates fragment length distribution into novel transcript reconstruction from paired-end RNA-Seq reads. To reconstruct novel transcripts, they create a splice graph based on inferred exon boundaries and RNA-Seq reads. A splice graph is a directed acyclic graph (DAG), whose vertices represent exons and edges represent splicing events. They enumerate all maximal paths in the splice graph using a depth-first-search (DFS) algorithm. These paths correspond to putative transcripts and are the input for the TRIP algorithm.

To solve the transcriptome reconstruction problem you must select a set of putative transcripts with the highest support from the RNA-Seq reads. They formulate this problem as an integer program. The objective to select the smallest set of putative transcripts that yields a good statistical fit between the fragment length distribution empirically determined during library preparation and fragment lengths implied by mapping read pairs to selected transcripts.

Preliminary experimental results on synthetic datasets generated with various sequencing parameters and distribution assumptions show that TRIP has increased transcriptome reconstruction accuracy compared to previous methods that ignore fragment length distribution information.

  • Mangul S, Caciula A, Brinza D, Mandoiu II, Zelikovsky A.  (2012) TRIP: a method for novel transcript reconstruction from paired-end RNA-seq reads. BMC Bioinformatics – part of the supplement: Highlights from the Eighth International Society for Computational Biology (ISCB) Student Council Symposium 2012. [abstract]

Incoming search terms:

  • rna-seq novel transcripts
  • STAR mapping colorspace
  • map reads to transcripts
  • mapping reads to transcripts
  • Novel Transcription RNA SEQ

This method addresses the problem of how to use RNA-Seq data for transcriptome reconstruction and quantification, as well as novel transcript discovery in partially annotated genomes. Researchers at Georgia State University have developed a novel annotation-guided general framework for transcriptome discovery, reconstruction and quantification in partially annotated genomes and compared it with existing annotation-guided and genome-guided transcriptome assembly methods. Their method, referred as Discovery and Reconstruction of Unannotated Transcripts (DRUT), can be used to enhance existing transcriptome assemblers, such as Cufflinks, as well as to accurately estimate the transcript frequencies. Empirical analysis on synthetic datasets confirms that Cufflinks enhanced by DRUT has superior quality of reconstruction and frequency estimation of transcripts.

The software is written in C++ and is available at: http://www.cs.gsu.edu/~serghei/?q=drut

  •  Mangul S, Caciula A, Glebova O, Mandoiu I, Zelikovsky A. (2012) Improved transcriptome quantification and reconstruction from RNA-Seq reads using partial annotations. In Silico Biol 11(5):251-61. [abstract]

Incoming search terms:

  • drut genome assembly
  • Quantitation of RNA
  • rna-seq novel transcript discovery

Next Page →

  • Social Networking Pages

    Linkedin Group

  • Follow Me on Pinterest
  • RSS SEQanswers – RNA Sequencing

    • Identifying small RNA sequence within whole genome sequence May 21, 2013
      Hi all, I want to know if there are any useful bioinformatic tool to find small RNA sequence within a whole bacteria genome. Thank you in... […]
      Inma
    • standard of clean data May 21, 2013
      Hi all I recently got my prokaryotes RNA-seq data report back. the standard filter steps of the raw data set by our local sequencing center is as... […]
      Pengfei Liu
    • Problem with cummeRbund diffData() May 20, 2013
      Hi all, I'm running Tophat/cufflinks/cuffdiff for differential gene expression and analysis with cummeRbund (v 2.0.0). I'm having an issue with... […]
      Enrique Zudaire
    • How to increase rowsize in heatmap? May 16, 2013
      Hi, I am a complete newbie to all things cummeRbund and am currently fighting with generating readable heatmaps. When I use ... […]
      Mags
    • novoalign mapping May 15, 2013
      Hi, I want to use novoalign to map reads - allowing up to 15 mismatches for 100 bp paired-end reads I am new to novoalign(went through the... […]
      abh
    • Design of expt across multiple lanes May 15, 2013
      Hi, I am performing an RNA-seq experiment to look at differential expression. The design is as follows: 2 populations x 3 biological... […]
      jbono
  • RSS Biostar – RNA-Seq

    • What are the best practices for SNP identification in RNA seq transcriptome data
      I have 20 RICE RNA seq tranascriptome data hiseq 2000 platform paired end reads. I aligned fasta reads with BWA and remove PCR duplicates with PICARD. Later I call SNP with samtools using various parameters. I would like to clarify what parameters should I used while alinging to reference rice genome for looking SNP location 100 bp upstream and 250 bp downst […]
    • How do TopHat options -g , --supress-hits, and Bowtie options interplay?
      Hi, I am currently using TopHat2 to map RNA-seq runs. I think there have been some changes pertaining the -g option. Does anyone know how it works now? I used to think that setting -g would look for n alignments for a given read, report them [if top-scoring] and discard those reads that had more than g [top scoring] alignments. Now, the description sounds mo […]
    • What happened to -k in TopHat for multiple-mapping reads?
      Selecting -g n in tophat does not discard reads mapping more than n, but instead only reports n alignments for those out all all their TOP scoring alignments. I think there used to be an option -k that would allow one to discard reads that topped x alignments -- whatever happened to that? I only see -g in the tophat 2 manual, no reporting options like before […]
    • Does tophat use the library-type information for mapping, or just for the XS flag?
      When I specify library-type to TopHat, i.e., first-strand, second-strand, unstranded, TopHat appends a value + or - to the XS:A tag, which is useful for subsequent analyses, such as annotation. However, does this information influence the "mappability" of reads, or is this unaffected? My guess is that the information will be considered for mapping […]
    • Purpose of Y-shaped adapters in Illumina Sequencing?
      Hi all, Y adapters different sequences to be annealed to the 5' and 3' ends of each molecule in a library. The arms of the Y are unique, and the middle part, connected to the DNA fragment, is complementary. What are the advantages of this? My take of this over having fully-complementary adapters (ADAPTER1 - - - - - ADAPTER1) is that: -Upon primer a […]
    • Cell Type composition in a tissue based on gene marker expression
      I am not sure if the following would even make sense.... Tissues are composed of composite cell types, and often there are studies such as microarray/NGS where we perform a collective sampling of cells from these tissues. Information about the composition (say percentage of cell type) is not taken into consideration. In some case (such as brain/cancer), ther […]