Increased emphasis on reproducibility of published research in the last few years has led to the large-scale archiving of sequencing data. While this data can, in theory, be used to reproduce results in papers, it is typically not easily usable in practice. Researchers at UC Berkeley introduce a series of tools for processing and analyzing RNA-Seq data in the Short Read Archive, that together have allowed them to build an easily extendable resource for analysis of data underlying published papers. This system makes the exploration of data easily accessible and usable without technical expertise.
Workflow of The Lair system for distributing analysis of short read archive data
The inputs to the system are sets of two files: config.json file that specifies parameters to be used during the processing of each experiment and a design matrix for each experiment that specifies its structure. A master Snakemake workflow organizes a series of computations starting with downloading of data to the short read archive and ending with deployment of sleuth analyses to a Shiny server. Finally, a website generated from information in the config.json files links to objects in the Shiny server thus providing access to the processed experiments.
Availability – The database and associated tools can be accessed at The Lair: http://pachterlab.github.io/lair