The Eukaryotic Pathogen, Vector and Host Informatics Resource represents the 2019 merger of VectorBase with the EuPathDB projects. As a Bioinformatics Resource Center funded by the National Institutes of Health, with additional support from the Welllcome Trust, VEuPathDB supports >500 organisms comprising invertebrate vectors, eukaryotic pathogens (protists and fungi) and relevant free-living or non-pathogenic species or hosts. Designed to empower researchers with access to Omics data and bioinformatic analyses, VEuPathDB projects integrate >1700 pre-analysed datasets (and associated metadata) with advanced search capabilities, visualizations, and analysis tools in a graphic interface. Diverse data types are analysed with standardized workflows including an in-house OrthoMCL algorithm for predicting orthology. Comparisons are easily made across datasets, data types and organisms in this unique data mining platform. A new site-wide search facilitates access for both experienced and novice users. Upgraded infrastructure and workflows support numerous updates to the web interface, tools, searches and strategies, and Galaxy workspace where users can privately analyse their own data. Forthcoming upgrades include cloud-ready application architecture, expanded support for the Galaxy workspace, tools for interrogating host-pathogen interactions, and improved interactions with affiliated databases (ClinEpiDB, MicrobiomeDB) and other scientific resources, and increased interoperability with the Bacterial & Viral BRC.
VEuPathDB data production workflow and architecture. The complete pathway from data acquisition to web presentation and utilization by users is detailed. Production activities and systems are represented in the bottom purple box and the services and presentation layers are represented in the pink and grey boxes. Data enter the system in the data staging box where they are identified and prioritized by Outreach and entered into the Redmine issue tracking system. Once data are cleaned and structured, datasets are available to the processing and integration workflows. Genome sequence, annotation, RNA Seq and DNA sequencing reads are processed at the EBI (bottom box) and passed back to Penn (top box) for data integration and subsequent processing including integration of functional data and ortholog assignment. Data are prepared by these workflows for presentation in the form of relational databases and indexed flat files. The web clients provide access to users via a set of services that communicate with the back-end data stores. The system also includes a user data analysis system (right side) enabling users to analyse their own data and, for some datatypes, import their results into VEuPathDB for analysis and integration with publicly available data.
Availability – VEuPathDB (https://veupathdb.org). VEuPathDB also makes most source code publicly available in a GitHub repository (https://github.com/VEuPathDB).