While RNA-Seq’s capability of high-resolution and accuracy in transcript abundance estimation has been thoroughly demonstrated, (so much so that it is being heralded as a possible replacement for microarray based gene expression technology) there is another important application for RNA-Seq; the improvement of existing genome annotations and even the possibility of complete de novo genome annotation.
Improvements to current genome annotation is a topic that has been discussed before on the RNA-Seq Blog. See post from earlier this year:
Now, researchers at UC Berkley and the Broad Institute have developed a novel approach termed “reference annotation based transcript (RABT) assembly”. They claim that it is a “pure” assembler and that it does not utilize information about the structure and content of coding genes, or other external input (e.g. ESTs) during the assembly.
However, a problem exists with using RNA-Seq for annotation. Genes that are expressed at a low level will be represented by few reads and may be only partially covered. This means that naive assembly methods will fail to reconstruct the majority of full-length transcripts.
Availability: The methods described in this paper are implemented in the Cufflinks suite of software for RNA-Seq, freely available from http://bio.math.berkeley.edu/cufflinks.
- Roberts A, Pimentel H, Trapnell C, Pachter L. (2011) Identification of novel transcripts in annotated genomes using RNA-Seq. Bioinformatics [Epub ahead of print]. [abstract]