Discovering meaningful gene interactions is crucial for the identification of novel regulatory processes in cells. Building accurately the related graphs remains challenging due to the large number of possible solutions from available data. Nonetheless, enforcing a priori on the graph structure, such as modularity, may reduce network indeterminacy issues. Researchers at IFP Energies nouvelles have developed BRANE Clust (Biologically-Related A priori Network Enhancement with Clustering) which refines gene regulatory network (GRN) inference thanks to cluster information. It works as a post-processing tool for inference methods (i.e. CLR, GENIE3). In BRANE Clust, the clustering is based on the inversion of a system of linear equations involving a graph-Laplacian matrix promoting a modular structure. The approach is validated on DREAM4 and DREAM5 datasets with objective measures, showing significant comparative improvements. The researchers provide additional insights on the discovery of novel regulatory or co-expressed links in the inferred Escherichia coli network evaluated using the STRING database. The comparative pertinence of clustering is discussed computationally (SIMoNe, WGCNA, X-means) and biologically (RegulonDB).
Graph-based clustering using a decoupling strategy for hard-clustering
The principle is similar for soft-clustering. Gray nodes represent TFs nodes. The T-label problem is decomposed into T binary sub-problems by setting the component t of marker labels s(t); t 2 T, to one and the others to zero. Each sub-problem t leads to a probability for each node. The final node clustering corresponds to the label whose probability amidst the T sub-problems is maximal.
Availability – BRANE Clust software is available at: http://www-syscom.univ-mlv.fr/~pirayre/Codes-GRN-BRANE-clust.html