Data exploration is critical to the comprehension of large biological datasets obtained by high-throughput assays such as sequencing. Interactive exploration fosters the generation of novel data-driven hypotheses prior to rigorous statistical analysis, enables diagnosis of potential problems during quality control and facilitates interpretation of the results in the context of a specific scientific question. To this end, visualization of the data in an intuitive and interactive interface is crucial. However, most existing tools for interactive visualization are limited to specific assays or analyses and lack support for reproducible analysis.
A team of researchers from the Universities of Oxford, Mainz, Zurich and Cambridge have built a general-purpose tool for interactive visualization of high-throughput biological data. As its name suggests, iSEE is designed for interactive visualization of any experimental data and/or associated metadata stored in an instance of a SummarizedExperiment container. It is implemented using the Shiny framework and provides:
- Broad applicability to any object based on the SummarizedExperiment class, which is central to the Bioconductor infrastructure. This means that iSEE is directly compatible with a broad variety of datasets and data types including large-scale bulk RNA-seq, single-cell RNA-seq, and mass cytometry.
- Customizability of the interface including the organization of plot and table panels, the type of data shown in each panel and the aesthetics of each plot.
- Ability for users to programmatically define additional custom panels based on any number of user-defined functions performing on-the-fly calculations.
- Dynamic linking between plot and table panels, allowing users to transmit information across multiple panels via point selection.
- Bespoke interactive guided tours, which allow researchers to communicate their results through a step-by-step description of the salient features of the data.
- Automatic tracking, storage, and rendering of the exact R code to generate all visible plots in a given application instance.
To demonstrate the capabilities of iSEE, example applications are available online to showcase the interactive exploration of:
- TCGA RNA sequencing data (https://marionilab.cruk.cam.ac.uk/iSEE_tcga)
- Single-cell RNA sequencing data (https://marionilab.cruk.cam.ac.uk/iSEE_allen, https://marionilab.cruk.cam.ac.uk/iSEE_pbmc4k)
- Mass cytometry data (https://marionilab.cruk.cam.ac.uk/iSEE_cytof)
iSEE uses a customisable multi-panel layout (A) that simultaneously displays one or more panels of various types, where each panel type visualises a different aspect of the data. New panels of any type can be added (i), and all panels can be removed, reordered or resized (ii). Panel types are available to visualise sample-based reduced dimensionality embeddings (iii), sample-level metadata (iv), and experimental observations across samples for each feature (v). Other panel types include row statistics tables (vi), to facilitate searching across features and their metadata; heatmaps (vii), to visualise experimental observations for multiple features; and feature-level metadata plots. Panels of each type are colour-coded for ease of interpretation. (B) Information can be transmitted between panels according to a user-specified scheme. Here, the selection of feature X in the row statistics table determines the y-axis of the feature assay plot, and colours the samples in the reduced dimension plot by the expression of X. Selection of points in the reduced dimension plot (dotted blue line) also determines the samples that are shown in the column data (i.e., sample metadata) plot; further selection of points in the column data plot determines the samples that are shown in the heatmap.
Availability – iSEE is publicly available as R package from the open-source Bioconductor project and is actively developed at https://github.com/csoneson/iSEE. Interested users should consult the vignettes available at https://bioconductor.org/packages/iSEE for more details.