GRIT – Genome-guided transcript assembly by integrative analysis of RNA sequence data

The identification of full length transcripts entirely from short-read RNA sequencing data (RNA-seq) remains a challenge in the annotation of genomes. Here researchers from the University of California at Berkeley describe an automated pipeline for genome annotation that integrates RNA-seq and gene-boundary data sets, which they call Generalized RNA Integration Tool, or GRIT.

Applying GRIT to Drosophila melanogaster short-read RNA-seq, cap analysis of gene expression (CAGE) and poly(A)-site-seq data collected for the modENCODE project, they recovered the vast majority of previously annotated transcripts and doubled the total number of transcripts cataloged. They found that 20% of protein coding genes encode multiple protein-localization signals and that, in 20-d-old adult fly heads, genes with multiple polyadenylation sites are more common than genes with alternative splicing or alternative promoters. GRIT demonstrates 30% higher precision and recall than the most widely used transcript assembly tools. GRIT will facilitate the automated generation of high-quality genome annotations without the need for extensive manual annotation.


Availability – All software associated with this project and the pipelines run to generate these annotations are available for download at

Boley N, Stoiber MH, Booth BW, Wan KH, Hoskins RA, Bickel PJ, Celniker SE, Brown JB. (2014) Genome-guided transcript assembly by integrative analysis of RNA sequence data. Nat Biotechnol [Epub ahead of print]. [abstract]