BRAKER1 – Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS

Gene finding in eukaryotic genomes is notoriously difficult to automate. The task is to design a work flow with a minimal set of tools that would reach state-of-the-art performance across a wide range of species. GeneMark-ET is a gene prediction tool that incorporates RNA-Seq data into unsupervised training and subsequently generates ab initio gene predictions. AUGUSTUS is a gene finder that usually requires supervised training and uses information from RNA-Seq reads in the prediction step. Complementary strengths of GeneMark-ET and AUGUSTUS provided motivation for designing a new combined tool for automatic gene prediction.

Researchers at Universität Greifswald have developed BRAKER1, a pipeline for unsupervised RNA-Seq-based genome annotation that combines the advantages of GeneMark-ET and AUGUSTUS. As input, BRAKER1 requires a genome assembly file and a file in bam-format with spliced alignments of RNA-Seq reads to the genome. First, GeneMark-ET performs iterative training and generates initial gene structures. Second, AUGUSTUS uses predicted genes for training and then integrates RNA-Seq read information into final gene predictions. In their experiments, the researchers observed that BRAKER1 was more accurate than MAKER2 when it is using RNA-Seq as sole source for training and prediction. BRAKER1 does not require pre-trained parameters or a separate expert-prepared training step.

rna-seqGene prediction accuracy of of BRAKER1 and MAKER2 (both pipelines used repeat masking) as assessed by comparison with annotation of the genomes of four model organisms. In all cases, RNA-Seq was the only source of extrinsic evidence. For the fungus S. pombe, we also assessed the accuracy of gene predictions made by CodingQuarry.

Availability: BRAKER1 is available for download at http://bioinf.uni-greifswald.de/bioinf/downloads/ and http://exon.gatech.edu/

Contact: katharina.hoff@uni-greifswald.de & borodovsky@gatech.edu

Hoff KJ, Lange S, Lomsadze A, Borodovsky M, Stanke M. (2015) BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS. Bioinformatics [Epub ahead of print]. [abstract]

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Time limit is exhausted. Please reload CAPTCHA.