RNA-seq, wherein RNA transcripts expressed in a sample are sequenced and quantified, has become a widely used technique to study disease and development. With RNA-seq, transcription abundance can be measured, differential expression genes between groups and functional enrichment of those genes can be computed. However, biological insights from RNA-seq are often limited by computational analysis and the enormous volume of resulting data, preventing facile and meaningful review and interpretation of gene expression profiles. Particularly, in cases where the samples under study exhibit uncontrolled variation, deeper analysis of functional enrichment would be necessary to visualize samples’ gene expression activity under each biological function.
Marquette University researchers developed a Bioconductor package rgsepd that streamlines RNA-seq data analysis by wrapping commonly used tools DESeq2 and GOSeq in a user-friendly interface and performs a gene-subset linear projection to cluster heterogeneous samples by Gene Ontology (GO) terms. Rgsepd computes significantly enriched GO terms for each experimental condition and generates multidimensional projection plots highlighting how each predefined gene set’s multidimensional expression may delineate samples.
The rgsepd serves to automate differential expression, functional annotation, and exploratory data analyses to highlight subtle expression differences among samples based on each significant biological function.
Systems Architecture diagram of the components of the GSEPD system,
with major sections in red outlines
Blue items indicate automated systems. An experiment starts at the upper left, with the Sequencing Facility where the tissue samples are converted to gene expression quantification through sequencing and processing external to GSEPD. The user then creates a table of count data and defines the sample metadata and conditions to be compared (lower left, green items indicate user inputs). Across the top are External Resources, where functional annotation databases are curated by third parties and plug in to the rgsepd software package. The R code wraps subprocesses for differential expression, set enrichment, and set based projection scoring. The orange cylinder of sample data indicates a normalization produced by DESeq2 with useful expression measurements. Within the Projection Engine box are small diagrams of the integral vector projections and clustering analyses
Availability – Instructions, manuals, and sample data are available in the online help files and the project website at https://bioconductor.org/packages/release/bioc/html/rgsepd.html