Dynamic expression data, nowadays obtained using high-throughput RNA sequencing, are essential to monitor transient gene expression changes and to study the dynamics of their transcriptional activity in the cell or response to stimuli. Several methods for data selection, clustering and functional analysis are available; however, these steps are usually performed independently, without exploiting and integrating the information derived from each step of the analysis.
Here researchers from the University of Padova present FunPat, an R package for time series RNA sequencing data that integrates gene selection, clustering and functional annotation into a single framework. FunPat exploits functional annotations by performing for each functional term, e.g. a Gene Ontology term, an integrated selection-clustering analysis to select differentially expressed genes that share, besides annotation, a common dynamic expression profile.
FunPat performance was assessed on both simulated and real data. With respect to a stand-alone selection step, the integration of the clustering step is able to improve the recall without altering the false discovery rate. FunPat also shows high precision and recall in detecting the correct temporal expression patterns; in particular, the recall is significantly higher than hierarchical, k-means and a model-based clustering approach specifically designed for RNA sequencing data. Moreover, when biological replicates are missing, FunPat is able to provide reproducible lists of significant genes. The application to real time series expression data shows the ability of FunPat to select differentially expressed genes with high reproducibility, indirectly confirming high precision and recall in gene selection. Moreover, the expression patterns obtained as output allow an easy interpretation of the results.
Description of FunPat workflow. Starting from time series expression data monitored in two different conditions (A), the Bounded-Area method provides a rank of genes according to a statistic built from the experimental replicates (B). Both seeds and candidates are mapped to structured prior knowledge organized into Gene Sets (GS) and a model-based clustering is applied to each Gene Set, following the procedure described in C. The pipeline identifies both Gene Set-specific patterns, characterizing clusters of genes (e.g. Gene 1 and 4 for GS 2), and Main Patterns, characterizing clusters of Gene Sets (e.g. red Main Pattern, associated to GS 2 and 6).
Availability – FunPat package is provided in R/Bioconductor at link: http://sysbiobig.dei.unipd.it/?q=node/79