RNA-sequencing has become the standard experimental approach for accurately profiling the transcriptome in a number of systems. Principal component analysis (PCA) is frequently used in genomics applications, for quality assessment and exploratory analysis in high-dimensional data. Yet despite the availability of many software packages developed for this purpose, an interactive and comprehensive interface for performing these operations is lacking.
Researchers at the University Medical Center in Mainz, Germany, have developed a software package to enhance commonly performed analysis steps with an interactive and user-friendly application, pcaExplorer.
Overview of the pcaExplorer workflow
A typical analysis with pcaExplorer starts by providing the matrix of raw counts for the sequenced samples, together with the corresponding experimental design information. Alternatively, a combination of a DESeqDataSet and a DESeqTransform objects can be given as input. Specifying a gene annotation can allow displaying of alternative IDs, mapped to the row names of the main expression matrix. Documentation is provided at multiple levels (tooltips and instructions in the app, on top of the package vignette). After launching the app, the interactive session allows detailed exploration capability, and the output can be exported (images, tables) also in form of a R Markdown/HTML report, which can be stored or shared. (Icons contained in this figure are contained in the collections released by Font Awesome under the CC BY 4.0 license)
Notably, pcaExplorer supports reproducible research, by providing state saving as well as the automated creation of R Markdown reports. pcaExplorer is implemented in R using the Shiny framework, leveraging efficient data structures from the open-source Bioconductor project. Users can easily generate a wide variety of publication-ready graphs, while assessing the expression data in the different modules available, including a general overview, dimension reduction on samples and genes, as well as functional interpretation of the principal components.
Availability – pcaExplorer is distributed as an R package in the Bioconductor project (http://bioconductor.org/packages/pcaExplorer/), and is designed to assist a broad range of researchers in the critical step of interactive data exploration.