A workflow for managing large-scale experimental data

The development of high-throughput experimental technologies, such as next-generation sequencing, have led to new challenges for handling, analyzing and integrating the resulting large and diverse datasets. Bioinformatical analysis of these data commonly requires a number of mutually dependent steps applied to numerous samples for multiple conditions and replicates. To support these analyses, a number of workflow management systems (WMSs) have been developed to allow automated execution of corresponding analysis workflows. Major advantages of WMSs are the easy reproducibility of results as well as the reusability of workflows or their components.

Researchers from Ludwig-Maximilians-Universität München present Watchdog, a WMS for the automated analysis of large-scale experimental data. Main features include straightforward processing of replicate data, support for distributed computer systems, customizable error detection and manual intervention into workflow execution. Watchdog is implemented in Java and thus platform-independent and allows easy sharing of workflows and corresponding program modules. It provides a graphical user interface (GUI) for workflow construction using pre-defined modules as well as a helper script for creating new module definitions. Execution of workflows is possible using either the GUI or a command-line interface and a web-interface is provided for monitoring the execution status and intervening in case of errors. To illustrate its potentials on a real-life example, a comprehensive workflow and modules for the analysis of RNA-seq experiments were implemented and are provided with the software in addition to simple test examples.

Overview of Watchdog

rna-seq

a Modules are defined in an XSD format that describes the command to be executed and valid parameters. All modules together represent the software library that can be used in workflows and can be extended by defining new modules. b A workflow is defined in an XML format and consists of tasks that depend on each other. Among others, the XML format allows setting environment variables, defining different executors in the settings part of the workflow and processing replicate data in a straightforward way. cWatchdog parses the workflow, creates the corresponding tasks, executes them and verifies whether execution of each task terminated successfully or not. d Email notification (optional) and log files combined with either the GUI or a simple web-interface allow monitoring the execution of the workflow and intervening if necessary, e.g. by restarting tasks with modified parameters

Watchdog is a powerful and flexible WMS for the analysis of large-scale high-throughput experiments. The developers believe it will greatly benefit both users with and without programming skills who want to develop and apply bioinformatical workflows with reasonable overhead.

Availability – The software, example workflows and a comprehensive documentation are freely available at www.bio.ifi.lmu.de/watchdog.

Kluge M, Friedel CC. (2018) Watchdog – a workflow management system for the distributed analysis of large-scale experimental data. BMC Bioinformatics 19(1):97. [article]

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Time limit is exhausted. Please reload CAPTCHA.