GOexpress – visualize RNA-Seq and microarray data using gene ontology annotations

GOexpress accepts gene expression datasets obtained from both microarray and RNA-seq platforms formatted in the recommended Bioconductor “ExpressionSet” container, to evaluate the power of each feature expressed in the dataset to cluster biological samples according to known experimental factors. In a second step, genes associated with a common ontology (defaults to Ensembl BioMart annotations) are then summarized to identify GO terms clustering that best cluster the same biological samples.

“The integration of expression values with gene ontology analysis makes GOexpress stand apart from most Gene Ontology analysis tools seeking enrichment of GO terms within lists of gene names”, say Kevin Rue-Albrecht, one of the tool’s developers.

GOexpress enables the analysis of both continuous (e.g. time-series, drug concentration) and categorical (e.g. treatment, condition), to identify gene expression profiles – and GO terms – most consistently clustering samples across all data-points. Presently, GOexpress does not provide inferential statistics such as p-values. Instead, the use of the randomForest algorithm inherently allows competition between the expressed gene features in the dataset, to rank them by decreasing order of clustering power, hence prioritizing the visualization of top-ranked genes and GO terms. A one-way ANOVA is available as an alternative statistical framework, while other statistical tests may be added upon suggestion.

rna-seq

Heatmap of expression level for genes associated with the Gene Ontology “positive regulation of osteoclast differentiation”, top-ranked Biological Process returned by GOexpress for its ability to discriminate unstimulated bovine blood from PPDb-stimulated blood. This heatmap was obtained after filtering for genes expressed over 1 count per million (cpm) in at least 4 biological samples, and after filtering for GO terms associated with at least 15 genes in the Ensembl BioMart release 75 for Bos taurus. Expression values are normalised log2(cpm) obtained using edgeR.

Finally, GOexpress offers various data-driven plotting functions which easily integrate with widely used bioinformatics tools such as edgeR and DEseq to visualize the gene expression profile of differentially expressed genes.

rna-seq

The expression profile of the FOS gene, top-ranked gene among those associated with the GO term “positive regulation of osteoclast differentiation” for its ability to discriminate unstimulated bovine blood from PPDb-stimulated blood. The expression values are summarised for each four experimental conditions, each including four biological replicates. Expression values are normalised log2(cpm) obtained using edgeR.

Availability – GOexpress is available as a Bioconductor package. The currently recommended version of GOexpress is the development version 1.1.5, including new stable features and a clearer manual than the release version. http://www.bioconductor.org/packages/devel/bioc/html/GOexpress.html

Rue-Albrecht K, McGettigan PA, Hernandez B, Magee DA, Nalpas NC, Parnell A, Gordon SV, MacHugh DE. (2014) GOexpress: Visualise microarray and RNAseq data using gene ontology annotations. Forthcoming manuscriptthe user guide accompanying the package provides a detailed description and tutorial of the features.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Time limit is exhausted. Please reload CAPTCHA.