Reproducible research is a key component of the scientific method and represents the ability of repeating an experiment in any place with any person.
A study can be truly reproducible when it satisfies at least the following three criteria.
– All methods are fully reported.
– All data and files used for the analysis are (publicly) available.
– The process of analyzing raw data is well reported and preserved.
( Extracted from https://www.r-bloggers.com/what-is-reproducible-research/)
The above points should also apply to Bioinformatics, however today being able to reproduce a bioinformatics analysis is not guarantee by having access to raw data and to the process used for data analysis. Lack of reproducible results can be due to unclear explanation of the analytical process or differences in the system libraries, which might lead to sneaky reproducibility issues.
To address the above points we have setup the Reproducible Bioinformatics Project, which is a non-profit and open-source project, aiming to provide reproducible results in Bioinformatics using Docker images.
Specifically the project is based on the creation of easy to use Bioinformatics workflows that fullfill the following roles (Sandve et al. PLoS Comp Biol. 2013):
- For Every Result, Keep Track of How It Was Produced
- Avoid Manual Data Manipulation Steps
- Archive the Exact Versions of All External Programs Used
- Version Control All Custom Scripts
- Record All Intermediate Results, When Possible in Standardized Formats
- For Analyses That Include Randomness, Note Underlying Random Seeds
- Always Store Raw Data behind Plots
- Generate Hierarchical Analysis Output, Allowing Layers of Increasing Detail to Be Inspected
- Connect Textual Statements to Underlying Results
- Provide Public Access to Scripts, Runs, and Results
Today three workflows are available:
- RNAseq workflow
- miRNAseq workflow
- ChIPseq workflow
And other three are under development:
- PDX workflow: variants calling in patient derived xenograft (PDX) from RNAseq and EXOMEseq data
- Single cell analysis workflow
- Metagenomics workflow
We are looking for Bioinformaticians interested to be part of the Reproducible Bioinformatics Community. Bioinformaticians interested to embed specific applications in the available workflows or interested to develop a new workflow are requested to embed the application(s) in a docker image, save it in a public repository and configure one or more R functions that can be used to interact with the docker image.