Bioinformatics challenges in de novo transcriptome assembly using short read sequences in the absence of a reference genome sequence

Plant natural product research can be facilitated through genome and transcriptome sequencing approaches that generate informative sequence and expression datasets that enable characterization of biochemical pathways of interest. As the overwhelming majority of plant-derived natural products are derived from species with little, if any, sequence and/or genomic resources, the ability to perform whole genome shotgun sequencing and assembly has been and will continue to be transformative as access to a genome sequence provides molecular resources and a context for discovery and characterization of biosynthetic pathways. Due to the reduced size and complexity of the transcriptome relative to the genome, transcriptome sequencing provides a rapid, inexpensive approach to access gene sequences, gene expression abundances, and gene expression patterns in any species, including those that lack a reference genome sequence. To date, successful applications of RNA sequencing in conjunction with de novo transcriptome assembly has enabled identification of new genes in an array of biochemical pathways in plants. While sequencing technologies are well developed, challenges remain in the handling and analysis of transcriptome sequences. Here, the authors provide an overview of the bioinformatics challenges associated with transcriptome analyses using short read sequences and how to address these issues in plant species that lack a reference genome.

  • Góngora-Castillo E, Buell CR. (2013) Bioinformatics challenges in de novo transcriptome assembly using short read sequences in the absence of a reference genome sequence. Nat Prod Rep [Epub ahead of print]. [abstract]