De novo transcriptome assemblies of RNA-Seq data are important for genomics applications of unsequenced organisms. Due to the complexity and often incomplete representation of transcripts in sequencing libraries, the assembly of high-quality transcriptomes can be challenging. However, with the rapidly growing number of sequenced genomes it is now feasible to improve RNA-Seq assemblies by guiding them with genomic sequences.

This study introduces BRANCH, an algorithm designed for improving de novo transcriptome assemblies by utilizing genomic information that can be partial or complete genome sequences from the same or a related organism. Its input includes assembled RNA reads (transfrags), genomic sequences (e.g. contigs) and the RNA reads themselves. It uses a customized version of BLAT to align the transfrags and RNA reads to the genomic sequences. After identifying exons from the alignments, it defines a directed acyclic graph and maps the transfrags to paths on the graph. It then joins and extends the transfrags by applying an algorithm that solves a combinatorial optimization problem, called the Minimum weight Minimum Path Cover with given Paths (MMPCP). In performance tests on real data from C. elegans and S. cerevisiae, assisted by genomic contigs from the same species, BRANCH improved the sensitivity and precision of transfrags generated by Velvet/Oases or Trinity by 5.1-56.7% and 0.3-10.5%, respectively. These improvements added 3.8-74.1% complete transcripts and 8.3-33.8% proteins to the initial assembly. Similar improvements were achieved when guiding the BRANCH processing of a transcriptome assembly from a more complex organism (mouse) with genomic sequences from a related species (rat).

BRANCH

Availability: The BRANCH software can be downloaded for free from this site: http://manuals.bioinformatics.ucr.edu/home/branch.

Contact: thomas.girke@ucr.edu

  • Bao E, Jiang T, Girke T.(2013) BRANCH: boosting RNA-Seq assemblies with partial or related genomic sequences. Bioinformatics [Epub ahead of print]. [abstract]

Incoming search terms:

  • disadvantages of using abyss sequence assembly manual
  • structural variation rna seq
  • rna-seq splicing graph
  • transcript assembly software
  • transcriptome and assembly
  • combining assemblies from different software
  • combined transcriptome assemblies
  • rnaseq assembler comparison
  • assembly of rna
  • newt transcriptome april

Transcriptome reconstruction is an important application of RNA-Seq, providing critical information for further analysis of transcriptome. Although RNA-Seq offers the potential to identify the whole picture of transcriptome, it still presents special challenges. To handle these difficulties and reconstruct transcriptome as completely as possible, current computational approaches mainly employ two strategies: de novo assembly and genome-guided assembly.

Researchers at the Center for Bioinformatics and Computational Biology, East China Normal University, Shanghai chose five representative assemblers belonging to the two classes respectively, then investigated and compared their algorithm features in theory and real performances in practice.

The researchers found that all the methods can be reduced to graph reduction problems, yet they have different conceptual and practical implementations, thus each assembly method has its specific advantages and disadvantages, performing worse than others in certain aspects while outperforming others in anther aspects at the same time. Finally they merged assemblies of the five assemblers and obtained a much better assembly. Additionally they evaluated an assembler using genome-guided de novo assembly approach, and achieved good performance. Based on these results, they suggest that to obtain a comprehensive set of recovered transcripts, it is better to use a combination of de novo assembly and genome-guided assembly.

  • Lu B, Zeng Z, Shi T. (2013) Comparative study of de novo assembly and genome-guided assembly strategies for transcriptome reconstruction based on RNA-Seq. Sci China Life Sci 56(2):143-55. [abstract]

Incoming search terms:

  • comparative study of de novo assembly and genome-guided assembly strategies for transcriptome reconstruction based on rna-seq
  • trinity denovo assembler manual
  • transcriptome assembly software
  • reference transcriptome assembly
  • genome reconstruction tool
  • oases: robust de novo rna-seq assembly across the dynamic range of expression levels
  • transcriptome assemblers
  • mouse rna-seq de novo assembly
  • trinity assembler tutorial
  • map transcriptome to genome software

Transcriptomic sequence resources represent invaluable assets for research, in particular for non-model species without a sequenced genome. To date, the Next Generation Sequencing technologies 454/Roche and Illumina have been used to generate transcriptome sequence databases by RNA-Seq for more than fifty different plant species. While some of the databases were successfully used for downstream applications, such as proteomics, the assembly parameters indicate that the assemblies do not yet accurately reflect the actual plant transcriptomes. Two different assembly strategies have been used, overlap consensus based assemblers for long reads and Eulerian path/de Bruijn graph assembler for short reads.

In this review, researchers from the Heinrich Heine University, Germany discuss the challenges and solutions to the transcriptome assembly problem. A list of quality control parameters and the necessary scripts to produce them are provided.

transcriptome assembly

  • Schliesky S, Gowik U, Weber AP, Bräutigam A. (2012) RNA-Seq Assembly – Are We There Yet? Front Plant Sci [Epub ahead of print]. [article]

Incoming search terms:

  • trans-abyss tutorial
  • reference based transcriptome assembly
  • merge transcriptome assemblies
  • Next-generation transcriptome assembly

DissectDissect (DIScovery of Structural Alteration Event Containing Transcripts), a novel transcriptome-to-genome alignment tool, can identify and characterize transcriptomic events such as duplications, inversions, rearrangements and fusions. Dissect is suitable for whole transcriptome structural variation discovery problems involving sufficiently long reads or accurately assembled contigs.

Dissect was tested on simulated transcripts altered via structural events, as well as assembled RNA-Seq contigs from human prostate cancer cell line C4-2. AVAILABILITY:

Dissect is available for public use at: http://dissect-trans.sourceforge.net

Yorukoglu D, Hach F, Swanson L, Collins CC, Birol I, Sahinalp SC. (2012) Dissect: detection and characterization of novel structural alterations in transcribed sequences. Bioinformatics 28(12):i179-i187. [article]

Incoming search terms:

  • www rna-seqblog com
  • transcriptome alignment software
  • map transcriptome to genome
  • align transcriptome assembly to
  • how to align transcriptome to genomic data
  • genome assembly vs transcriptome assembly blog
  • dissecting a novel
  • multiple transcriptome alignment
  • assembler mira rnaseq example
  • transcriptome and genome comparison

With a Genome

FANSe1 is a new, fast and accurate algorithm for nucleic acid sequence analysis with adjustable mismatch allowance settings and ability to handle indels to accurately and quantitatively map millions of reads to small or large reference genomes. It is a seed-based algorithm which uses the whole read information for mapping and high sensitivity and low ambiguity are achieved by using short and non-overlapping reads. Furthermore, FANSe uses hotspot score to prioritize the processing of highly possible matches and implements modified Smith-Watermann refinement with reduced scoring matrix to accelerate the calculation without compromising its sensitivity. The FANSe algorithm stably processes datasets from various sequencing platforms, masked or unmasked and small or large genomes. It shows a remarkable coverage of low-abundance mRNAs which is important for quantitative processing of RNA-Seq datasets.

AVAILABILITY: The FANSe algorithm is accessible at http://bioinformatics.jnu.edu.cn/software/fanse/. The web site contains a detailed tutorial and the source code for download

Without a Genome

Oases2 is a software package designed to heuristically assemble RNA-seq reads in the absence of a reference genome, across a broad spectrum of expression values and in presence of alternative isoforms. It achieves this by using an array of hash lengths, a dynamic filtering of noise, a robust resolution of alternative splicing events, and the efficient merging of multiple assemblies. It was tested on human and mouse RNA-seq data and is shown to improve significantly on the transABySS and Trinity de novo transcriptome assemblers.

AVAILABILITY: Oases is freely available under the GPL license at www.ebi.ac.uk/~zerbino/oases/

  1. Zhang G, Fedyunin I, Kirchner S, Xiao C, Valleriani A, Ignatova Z. (2012) FANSe: an accurate algorithm for quantitative mapping of large scale sequencing reads. Nucleic Acids Res [Epub ahead of print]. [article]
  2. Schulz MH, Zerbino DR, Vingron M, Birney E. (2012) Oases: Robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics [Epub ahead of print]. [article]

Incoming search terms:

  • transcriptome assembly
  • de novo assembly and analysis of rna-seq data
  • de novo rna-seq assembly
  • rna-seq de novo assembly
  • how to assemble RNAseq data to reference genome
  • rna seq data without the genome
  • rna seq data expressiom amalysis wothout refwrwnce genome
  • rna seq analysis without reference genome
  • rna seq reference genome
  • rna seq genome and transcriptome alignment reference

Transcriptome assembly based on RNA-Seq data, aims at reconstructing all full-length mRNA transcripts simultaneously from millions of short reads.

IsoLasso is a new RNA-Seq based transcriptome assembly tool. IsoLasso is based on the well-known LASSO algorithm, a multivariate regression method designated to seek a balance between the maximization of prediction accuracy and the minimization of interpretation. By including some additional constraints in the quadratic program involved in LASSO, IsoLasso is able to make the set of assembled transcripts as complete as possible. Experiments on simulated and real RNA-Seq datasets show that IsoLasso achieves, simultaneously, higher sensitivity and precision than the state-of-art transcript assembly tools. Read more

Incoming search terms:

  • IsoLasso
  • lasso prediction method methods for the novice
  • lasso rnaseq

Transcriptome AssemblyTranscriptomics studies often rely on partial reference transcriptomes that fail to capture the full catalogue of transcripts and their variations. Recent advances in sequencing technologies and assembly algorithms have facilitated the reconstruction of the entire transcriptome by deep RNA sequencing (RNA-seq), even without a reference genome. However, transcriptome assembly from billions of RNA-seq reads, which are often very short, poses a significant informatics challenge. This Review summarizes the recent developments in transcriptome assembly approaches — reference-based, de novo and combined strategies — along with some perspectives on transcriptome assembly in the near future.

  • Martin JA, Wang Z. (2011) Next-generation transcriptome assembly. Nat Rev Genet [Epub ahead of print]. [abstract]

Incoming search terms:

  • transcriptomics
  • de novo assembly transcriptomes repeats
  • De novo transcriptome assembly of RNA-Seq reads with different strategies
  • Transcriptome analyses strategies
  • de novo transcriptome assembly transcript loci
  • review transceiptome assembly
  • review transcriptomics
  • Review of Transcriptome Assembly Strategies & Tools
  • transcriptome assembly ppt
  • transcriptome based strategies

RNA-Seq has emerged as a powerful tool for studying transcriptomes. It aims to provide a comprehensive list of all transcripts and their expression levels from a given cell or cell population under a particular condition. RNA-Seq data analysis typically involves aligning the short read sequences to a reference genome to reveal reads from exons, splicing junctions, or polyA ends. This information is used to derive novel gene models or refine existing gene models, including exon structure and untranslated regions (UTRs) and to determine gene expression levels from read count statistics Read more

Incoming search terms:

  • Rnnotator
  • Rnnotator download
  • newvler reference based assembly documentation
  • reference based assembly
  • Rnnotator assembler
  • rnnotator pipeline
  • Rnnotator ppt
  • rnnotator software
  • Trinity for denovo metatranscriptome assembly

Next Generation Sequencing Transcriptome Assembly Workshop

01 November, 10 09:00 AM – 02 November, 10 01:00 PM

e-Science Institute, 15 South College Street, Edinburgh

Organiser: Mark Blaxter

A major output from the current generation of high throughput sequencers is transcriptomes. The transcriptome (all the RNAs produced by a cell or organism) represents the biological distillation of the genome, and is often a very useful way in which to investigate biology. Correct assembly of a transcriptome from the short sequence reads is an essential first step in further analysis, but there is still no consensus of best practice or optimality goals.

The Next generation Transcriptome Assembly Workshop is aimed at practising bioinformaticians and biologists who want o explore the options for assembling de novo the transcriptomes of new species using only next generation, short and long read data. We will examine the properties of the sequence data being produced, and review the available algorithms and packages that can be used for assembly. We will aim to produce a series of best-practice recommendations for the future application of these technologies to gene discovery and assembly.

Workshop webpage – http://www.nesc.ac.uk/esi/events/1104/

e-Science Institute homepage – http://www.esi.ac.uk/

Incoming search terms:

  • rna seq workshop
  • next generation sequencing workshop 2012 india
  • next generation sequencing workshop 2012
  • rnaseq workshop
  • rna sequencing workshop
  • transcriptomics workshop 2013
  • next gen sequencing workshop 2013 india
  • rna-seq workshop in virginia
  • analyzing next generation of sequencing data 2013 workshop
  • transcriptomics workshop

For species that lack reference genome sequences, or whose genomes are poorly annotated, de novo short-read transcriptome assembly may be a practical alternative to conventional expressed sequence tag–based approaches and to methods that depend on short-read alignments. Read more

Incoming search terms:

  • trans-abyss
  • trans abyss tutorial
  • trans seqs
  • de novo transcriptome assembly
  • abyss rna-seq
  • abyss rna seq
  • transabyss
  • transcriptome assembly of short reads
  • trans abyss
  • USING TRANS-ABYSS TO MERGE ASSEMBLIES

  • Social Networking Pages

    Linkedin Group

  • Follow Me on Pinterest
  • RSS SEQanswers – RNA Sequencing

    • Identifying small RNA sequence within whole genome sequence May 21, 2013
      Hi all, I want to know if there are any useful bioinformatic tool to find small RNA sequence within a whole bacteria genome. Thank you in... […]
      Inma
    • standard of clean data May 21, 2013
      Hi all I recently got my prokaryotes RNA-seq data report back. the standard filter steps of the raw data set by our local sequencing center is as... […]
      Pengfei Liu
    • Problem with cummeRbund diffData() May 20, 2013
      Hi all, I'm running Tophat/cufflinks/cuffdiff for differential gene expression and analysis with cummeRbund (v 2.0.0). I'm having an issue with... […]
      Enrique Zudaire
    • How to increase rowsize in heatmap? May 16, 2013
      Hi, I am a complete newbie to all things cummeRbund and am currently fighting with generating readable heatmaps. When I use ... […]
      Mags
    • novoalign mapping May 15, 2013
      Hi, I want to use novoalign to map reads - allowing up to 15 mismatches for 100 bp paired-end reads I am new to novoalign(went through the... […]
      abh
    • Design of expt across multiple lanes May 15, 2013
      Hi, I am performing an RNA-seq experiment to look at differential expression. The design is as follows: 2 populations x 3 biological... […]
      jbono
  • RSS Biostar – RNA-Seq

    • What are the best practices for SNP identification in RNA seq transcriptome data
      I have 20 RICE RNA seq tranascriptome data hiseq 2000 platform paired end reads. I aligned fasta reads with BWA and remove PCR duplicates with PICARD. Later I call SNP with samtools using various parameters. I would like to clarify what parameters should I used while alinging to reference rice genome for looking SNP location 100 bp upstream and 250 bp downst […]
    • How do TopHat options -g , --supress-hits, and Bowtie options interplay?
      Hi, I am currently using TopHat2 to map RNA-seq runs. I think there have been some changes pertaining the -g option. Does anyone know how it works now? I used to think that setting -g would look for n alignments for a given read, report them [if top-scoring] and discard those reads that had more than g [top scoring] alignments. Now, the description sounds mo […]
    • What happened to -k in TopHat for multiple-mapping reads?
      Selecting -g n in tophat does not discard reads mapping more than n, but instead only reports n alignments for those out all all their TOP scoring alignments. I think there used to be an option -k that would allow one to discard reads that topped x alignments -- whatever happened to that? I only see -g in the tophat 2 manual, no reporting options like before […]
    • Does tophat use the library-type information for mapping, or just for the XS flag?
      When I specify library-type to TopHat, i.e., first-strand, second-strand, unstranded, TopHat appends a value + or - to the XS:A tag, which is useful for subsequent analyses, such as annotation. However, does this information influence the "mappability" of reads, or is this unaffected? My guess is that the information will be considered for mapping […]
    • Purpose of Y-shaped adapters in Illumina Sequencing?
      Hi all, Y adapters different sequences to be annealed to the 5' and 3' ends of each molecule in a library. The arms of the Y are unique, and the middle part, connected to the DNA fragment, is complementary. What are the advantages of this? My take of this over having fully-complementary adapters (ADAPTER1 - - - - - ADAPTER1) is that: -Upon primer a […]
    • Cell Type composition in a tissue based on gene marker expression
      I am not sure if the following would even make sense.... Tissues are composed of composite cell types, and often there are studies such as microarray/NGS where we perform a collective sampling of cells from these tissues. Information about the composition (say percentage of cell type) is not taken into consideration. In some case (such as brain/cancer), ther […]