Researchers at the University of Würzburg have constructed a powerful and modular pipeline called ANNOgesic that provides the required analyses and simplifies RNA-Seq-based bacterial and archaeal genome annotation.
It is a modular, command-line tool that can integrate different types of RNA-Seq data based on dRNA-Seq (differential RNA-Seq) or RNA-Seq protocols that inclusde transcript fragmentation to generate high quality genome annotations. It can detect genes, CDSs/tRNAs/rRNAs, transcription starting sites (TSS) and processing sites, transcripts, terminators, untranslated regions (UTR) as well as small RNAs (sRNA), small open reading frames (sORF), circular RNAs, CRISPR related RNAs, riboswitches and RNA-thermometers. It can also perform RNA-RNA and protein-protein interactions prediction. Furthermore, it groups genes into operons and sub-operons and reveal promoter motifs. It can also allocate GO term and subcellular localization to genes. Several of ANNOgesic features are new implementations while other build on well known third-party tools for which it offers adaptive parameter-optimizations. Additionally, numerous visualization and statistics help the user to quickly evaluat feature predictions resulting from an ANNOgesic analysis. The tool was heavily tested with several RNA-Seq data set from bacterial as well as archaeal samples.
The genetic algorithm that ANNOgesic uses for optimizing the parameters of TSSpredator
It starts from the default parameters. These parameter sets will go through three steps – global change (change every parameter randomly), large change (change two of the parameters randomly), and then small change (adds/subtracts a small fraction to one of the parameters). It will then select the best parameter set for reproduction when one step is done. Usually, ANNOgesic can achieve the optimized parameters within 4000 runs.
Availability – The software is available under an open source license (ISCL) at: https://pythonhosted.org/ANNOgesic/