Large-scale sequencing experiments are complex and require a wide spectrum of computational tools to extract and interpret relevant biological information. This is especially true in projects where individual processing and integrated analysis of both small RNA and complementary RNA data is needed. Such studies would benefit from a computational workflow that is easy to implement and standardizes the processing and analysis of both sequenced data types.
Researchers from the University of Helsinki have developed SePIA (Sequence Processing, Integration, and Analysis), a comprehensive small RNA and RNA workflow. It provides ready execution for over 20 commonly known RNA-seq tools on top of an established workflow engine and provides dynamic pipeline architecture to manage, individually analyze, and integrate both small RNA and RNA data. Implementation with Docker makes SePIA portable and easy to run. The researchers demonstrate the workflow’s extensive utility with two case studies involving three breast cancer datasets. SePIA is straightforward to configure and organizes results into a perusable HTML report. Furthermore, the underlying pipeline engine supports computational resource management for optimal performance.
SePIA workflow summarized in five generalized modules
Each module contains a brief description of the major steps performed in each pipeline. For example, the ’double-pass’ alignment means reads are mapped first to the whole genome and then to a reference transcriptome. Colors used represent common processes (black), processes specific to small RNA (purple) and RNA (green) data, and the main outputs of the modules (grey). Incorporation of a miRNA-target mRNA database to the workflow is represented in blue. Interesting molecules of the analysis module are defined as differentially expressed, predicted, or mutated
Availability – SePIA is an open-source workflow available at http://anduril.org/sepia.