Combining Long and Short Read Sequencing Platforms Yields Unprecedented Detail of Planarian Transcriptome

Planarians (flatworms) are widely used as a model system and their genome is fairly well defined. Currently planarian researchers are supported by the existence of a whole genome shotgun assembly (43,294 contigs with no chromosomal structure ) and 74,388 ESTs for Schmidtea mediterranea. (See SmedGD) However, the establishment of massively parallel sequencing technologies has provided the opportunity to define genetic content, and in particular transcriptomes, in unprecedented detail.

Researchers at the University of Nottingham, UK have used a dual platform approach to transcript discovery for the planarian Schmidtea mediterranea to establish RNAseq for stem cell and regeneration biology

First, they used 454 long read transcriptome sequencing technology for gene discovery. The sequencing of this library generated 743,464 reads. These reads had a mean length of 278 bp, and an overall length distribution characteristic of 454 transcriptome sequencing with Titanium chemistry. Assembly detected 16,967 putative “isogroups” (genes) and a total of 21,030 putative “isotigs” (isoforms). They added 74,388 publicly available ESTs for Schmidtea mediterranea. The combined data utilized 581,365 of the 454 reads in the assembly and resulted in 17,628 putative genes and 22,698 isoforms.

Next, the authors performed short read sequencing on the SOLiD platform for iterative mapping to increase the definition of splice junctions between exons and define alternative transcript sequences. Massively parallel sequencing generated 903,642,430 50 bp reads using two flow cells on the SOLiD 3+ sequencing platform. 507,719,814 high quality reads were mapped,  representing a 1,060 fold coverage of the planarian transcriptome. Cufflinks was used to interpret the high quality mapped reads to produce a new annotated GTF. This initial transcriptome GTF file comprised 19,429 putative multiple exon transcripts defined entirely by those genes that were previously annotated, and over 153,038 putative single exon transcripts.

This work has defined an extensive planarian transcriptome and the authors anticipate that other ‘omic approaches will build on this comprehensive data set including RNAseq across many planarian regenerative stages, scenarios, tissues and phenotypes generated by RNAi.

Blythe MJ, Kao D, Malla S, Rowsell J, Wilson R, Evans D, Jowett J, Hall A, Lemay V, Lam S, Aboobaker A. (2010) A Dual Platform Approach to Transcript Discovery for the Planarian Schmidtea Mediterranea to Establish RNAseq for Stem Cell and Regeneration Biology. PLoS ONE 5(12), e15617. [article]