RNA-sequencing of plant material allows for hypothesis-free detection of multiple viruses simultaneously. This methodology relies on bioinformatics workflows for virus identification. Most workflows are designed for human clinical data, and few go beyond sequence mapping for virus identification. A team led by researchers at the University of St Andrews have developed a new workflow (Kodoja) for the detection of plant virus sequences in RNA-sequence data. Kodoja uses k-mer profiling at the nucleotide level and sequence mapping at the protein level by integrating two existing tools Kraken and Kaiju. Kodoja was tested on three existing RNA-seq datasets from grapevine, and two new RNA-seq datasets from raspberry. For grapevine, Kodoja was shown to be more sensitive than a method based on contig building and blast alignments (27 viruses detected compared to 19). The application of Kodoja to raspberry, showed that field-grown raspberries were infected by multiple viruses, and that RNA-seq can identify lower amounts of virus material than reverse transcriptase PCR. This work enabled the design of new PCR-primers for detection of Raspberry yellow net virus and Beet ringspot virus. Kodoja is a sensitive method for plant virus discovery in field samples and enables the design of more accurate primers for detection. Kodoja is available to install through Bioconda and as a tool within Galaxy.
Flow diagram summarizing the three modules of the Kodoja workflow: kodoja_build, kodoja_search and kodoja_retrieve
Availability – Kodoja is available for direct installation and use at the command line in Linux through Bioconda (https://anaconda.org/bioconda/kodoja). Alternatively, the code can be downloaded from github (https://github.com/abaizan/kodoja). Kodoja is also provided as a package in Galaxy, an open source web-based analytical environment for data analysis. This is available on GitHub (https://github.com/abaizan/kodoja_galaxy) and the Galaxy Tool Shed (https://toolshed.g2.bx.psu.edu/view/abaizan/kodoja)