Genome-wide protein-DNA binding is popularly assessed using specific antibody pulldown in Chromatin Immunoprecipitation Sequencing (ChIP-Seq) or Cleavage Under Targets and Release Using Nuclease (CUT&RUN) sequencing experiments. These technologies generate high-throughput sequencing data that necessitate the use of multiple sophisticated, computationally intensive genomic tools to make discoveries, but these genomic tools often have a high barrier to use because of computational resource constraints.
Researchers at St. Jude Children’s Research Hospital have developed a comprehensive, infrastructure-independent, computational pipeline called SEAseq, which leverages field-standard, open-source tools for processing and analyzing ChIP-Seq/CUT&RUN data. SEAseq performs extensive analyses from the raw output of the experiment, including alignment, peak calling, motif analysis, promoters and metagene coverage profiling, peak annotation distribution, clustered/stitched peaks (e.g. super-enhancer) identification, and multiple relevant quality assessment metrics, as well as automatic interfacing with data in GEO/SRA. SEAseq enables rapid and cost-effective resource for analysis of both new and publicly available datasets as demonstrated in our comparative case studies.
High-level SEAseq schematic
The flowchart shows the input files (left), a top-down overview of the analysis steps executed by SEAseq (center), and the outputs (right). The input consists of the specified files needed to utilize SEAseq; these files include FASTQ files (in compressed gzip [.gz] format) and/or SRA identifiers (SRR), Genome FASTA [.fa], Gene Annotation [.gtf], and optionally UHS/ DER/ DAC blacklist regions [.bed] and one or more of the MEME suite position weight matrix databases. The output consists of the analysis results files generated from SEAseq: mapping files [.bam], peaks [.bed] and peak coverage files [.wig;.tdf;.bw], per-promoter and metagene average coverage and heatmaps plots, peak annotation distribution tables, motif discovery and enrichment results, and quality metrics results [.html]
The easy-to-use and versatile design of SEAseq makes it a reliable and efficient resource for ensuring high quality analysis. Its cloud implementation enables a broad suite of analyses in environments with constrained computational resources. SEAseq is platform-independent and is aimed to be usable by everyone with or without programming skills.
Availability – SEAseq is available on the cloud at https://platform.stjude.cloud/workflows/seaseq and can be locally installed from the repository at https://github.com/stjude/seaseq.