SeqPlots – Interactive software for exploratory data analyses, pattern discovery and visualization in genomics

Experiments involving high-throughput sequencing are widely used for analyses of chromatin function and gene expression. Common examples are the use of chromatin immunoprecipitation for the analysis of chromatin modifications or factor binding, enzymatic digestions for chromatin structure assays, and RNA sequencing to assess gene expression changes after biological perturbations. To investigate the pattern and abundance of coverage signals across regions of interest, data are often visualized as profile plots of average signal or stacked rows of signal in the form of heatmaps. University of Cambridge researchers found that available plotting software was either slow and laborious or difficult to use by investigators with little computational training, which inhibited wide data exploration. To address this need, they developed SeqPlots, a user-friendly exploratory data analysis (EDA) and visualization software for genomics. After choosing groups of signal and feature files and defining plotting parameters, users can generate profile plots of average signal or heatmaps clustered using different algorithms in a matter of seconds through the graphical user interface (GUI) controls. SeqPlots accepts all major genomic file formats as input and can also generate and plot user defined motif densities. Profile plots and heatmaps are highly configurable and batch operations can be used to generate a large number of plots at once. The analysis features and ease of use of SeqPlots encourages wide data exploration, which should aid the discovery of novel genomic associations.

An example of SeqPlots workflow to analyze H2A.Z, H3K36me3, H3K4me3 and CpG density across C. elegans protein coding TSSs separated by expression quintiles

rna-seq

(a,b) Top, GUI interface showing clickable grid of signal/feature combinations. Bottom, plots resulting from the clicked selections. (c) Plots of individual signals across genes in top expression quintile anchored at TSSs, plotting 1 kb upstream and 1.5 kb downstream of TSSs. (d) Heatmaps generated using k-means clustering (3 clusters) of TSSs in top expression quintile, using H3K36me3 signal for clustering. (e) Average signal profiles and (f) heatmaps generated from cluster 2 (C2) in (d) made by downloading full cluster data and uploading file with cluster 2 regions. Heatmaps were clustered using H3K4me3, H2A.Z and CpG signals.

Availability – SeqPlots is available as a GUI application for Mac or Windows and Linux, or as an R/Bioconductor package. It can also be deployed on a server for remote and collaborative usage. http://przemol.github.io/seqplots/

Stempor P, Ahringer J. (2016) SeqPlots – Interactive software for exploratory data analyses, pattern discovery and visualization in genomics. Wellcome Open Res 1:14. [article]

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Time limit is exhausted. Please reload CAPTCHA.