GOexpress – an R/Bioconductor package for the identification and visualisation of robust gene ontology signatures through supervised learning of gene expression data

Identification of gene expression profiles that differentiate experimental groups is critical for discovery and analysis of key molecular pathways and also for selection of robust diagnostic or prognostic biomarkers. While integration of differential expression statistics has been used to refine gene set enrichment analyses, such approaches are typically limited to single gene lists resulting from simple two-group comparisons or time-series analyses. In contrast, functional class scoring and machine learning approaches provide powerful alternative methods to leverage molecular measurements for pathway analyses, and to compare continuous and multi-level categorical factors.

Researchers from University College Dublin introduce GOexpress, a software package for scoring and summarising the capacity of gene ontology features to simultaneously classify samples from multiple experimental groups. GOexpress integrates normalised gene expression data (e.g., from microarray and RNA-seq experiments) and phenotypic information of individual samples with gene ontology annotations to derive a ranking of genes and gene ontology terms using a supervised learning approach. The default random forest algorithm allows interactions between all experimental factors, and competitive scoring of expressed genes to evaluate their relative importance in classifying predefined groups of samples.

Overview of the GOexpress workflow

rna-seq

A typical GOexpress analysis takes as input: an ExpressionSet of the Biobase Bioconductor package containing either microarray or RNA-seq normalised expression data; the name of an experimental factor present in the phenoData slot of the ExpressionSet; and annotations for the features and GO terms (or other functional classes) considered. The GO_analyse function calculates scores and ranks for the individual genes and GO terms. Optionally, the pValue_GO function randomly permutes the gene features to estimate the probability of each GO term to rank (or score) higher by chance. Finally, various functions allow visualisation of gene expression profiles by gene and gene ontology, and export of the calculated statistics in text files

Availability – Project home page: http://​bioconductor.​org/​packages/​release/​bioc/​html/​GOexpress.​html

Rue-Albrecht K, McGettigan PA, Hernández B, Nalpas NC, Magee DA, Parnell AC, Gordon SV, MacHugh DE. (2016) GOexpress: an R/Bioconductor package for the identification and visualisation of robust gene ontology signatures through supervised learning of gene expression data. BMC Bioinformatics 17(1):126. [article]

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Time limit is exhausted. Please reload CAPTCHA.