Reproducible Bioconductor workflows using browser-based interactive notebooks and containers

Bioinformatics publications typically include complex software workflows that are difficult to describe in a manuscript. University of Washington, Tacoma researchers describe and demonstrate the use of interactive software notebooks to document and distribute bioinformatics research. They provide a user-friendly tool, BiocImageBuilder, that allows users to easily distribute their bioinformatics protocols through interactive notebooks uploaded to either a GitHub repository or a private server.

Overview of the approach

rna-seq

The author of the Bioconductor workflow uses BiocImageBuilder to generate a Dockerfile that describes the Bioconductor and CRAN packages installed. The Dockerfile and the notebook files are uploaded to a server or GitHub repository. A custom container is then built with the default Linux base image for Bioconductor, dependencies for Jupyter, JuptyerHub, and/or Binder, and the Bioconductor packages. For GitHub installations, the Binder server builds the container and provides a link to run the container on its public cluster. JupyterHub provides the same functionality locally or on a private server. Using the container, the end user is able to view the notebook and execute, modify, and save the code on his or her local machine regardless of whether it uses Linux, MacOS, or Windows. In the case where the container is run remotely, no additional installation of software is required on the part of the end user.

The researchers present four different interactive Jupyter notebooks using R and Bioconductor workflows to infer differential gene expression, analyze cross-platform datasets, process RNA-seq data and KinomeScan data. These interactive notebooks are available on GitHub. The analytical results can be viewed in a browser. Most importantly, the software contents can be executed and modified. This is accomplished using Binder, which runs the notebook inside software containers, thus avoiding the need to install any software and ensuring reproducibility. All the notebooks were produced using custom files generated by BiocImageBuilder.

BiocImageBuilder facilitates the publication of workflows with a point-and-click user interface. The researchers demonstrate that interactive notebooks can be used to disseminate a wide range of bioinformatics analyses. The use of software containers to mirror the original software environment ensures reproducibility of results. Parameters and code can be dynamically modified, allowing for robust verification of published results and encouraging rapid adoption of new methods.

Given the increasing complexity of bioinformatics workflows, the developers anticipate that these interactive software notebooks will become as necessary for documenting software methods as traditional laboratory notebooks have been for documenting bench protocols, and as ubiquitous.

Availability – Note that BiocImageBuilder is designed for those who wish to author an interactive Bioconductor notebook – it is not required for end users who wish to interact with a published notebook. The source code of BiocImageBuilder is publicly available at https://github.com/Bioconductor-notebooks/BiocImageBuilder and its Docker image is publicly available at https://hub.docker.com/r/biodepot/bioc-builder/.

Almugbel R, Hung LH, Hu J, Almutairy A, Ortogero N, Tamta Y, Yeung KY. (2017) Reproducible Bioconductor workflows using browser-based interactive notebooks and containers. J Am Med Inform Assoc [Epub ahead of print]. [abstract]

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Time limit is exhausted. Please reload CAPTCHA.