With the cost of DNA sequencing decreasing, increasing amounts of RNA-Seq data are being generated giving novel insight into gene expression and regulation. Prior to analysis of gene expression, the RNA-Seq data has to be processed through a number of steps resulting in a quantification of expression of each gene / transcript in each of the analyzed samples. A number of workflows are available to help researchers perform these steps on their own data, or on public data to take advantage of novel software or reference data in data re-analysis. However, many of the existing workflows are limited to specific types of studies. Researchers at the University of Bergen aimed to develop a maximally general workflow, applicable to a wide range of data and analysis approaches and at the same time support research on both model and non-model organisms. Furthermore, the researchers aimed to make the workflow usable also for users with limited programming skills.
Utilizing the workflow management system Snakemake and the package management system Conda, they have developed a modular, flexible and user-friendly RNA-Seq analysis pipeline: RNA-Seq Analysis Snakemake Workflow (RASflow). Utilizing Snakemake and Conda alleviates challenges with library dependencies and version conflicts and also supports reproducibility. To be applicable for a wide variety of applications, RASflow supports mapping of reads to both genomic and transcriptomic assemblies. RASflow has a broad range of potential users: it can be applied by researchers interested in any organism and since it requires no programming skills, it can be used by researchers with different backgrounds.
Overview of the steps performed by RNA-Seq Analysis Snakemake Workflow(RASflow)
Availability – RASflow is an open source tool and source code as well as documentation, tutorials and example data sets can be found on GitHub: https://github.com/zhxiaokang/RASflow