NetGen – a novel network-based probabilistic generative model for gene set functional enrichment analysis

High-throughput experimental techniques have been dramatically improved and widely applied in the past decades. However, biological interpretation of the high-throughput experimental results, such as differential expression gene sets derived from microarray or RNA-seq experiments, is still a challenging task. Gene Ontology (GO) is commonly used in the functional enrichment studies. The GO terms identified via current functional enrichment analysis tools often contain direct parent or descendant terms in the GO hierarchical structure. Highly redundant terms make users difficult to analyze the underlying biological processes.

Researchers from the Chinese Academy of Sciences propose a novel network-based probabilistic generative model, NetGen, to perform the functional enrichment analysis. An additional protein-protein interaction (PPI) network was explicitly used to assist the identification of significantly enriched GO terms. NetGen achieved a superior performance than the existing methods in the simulation studies. The effectiveness of NetGen was explored further on four real datasets. Notably, several GO terms which were not directly linked with the active gene list for each disease were identified. These terms were closely related to the corresponding diseases when accessed to the curated literatures.

The workflow of NetGen. The active gene list G is the model input


We want to identify the most enriched GO term set C, which has a reasonable biological explanation to G, as the final output. A greedy-based heuristic algorithm was used to maximum the log-likelihood function

This procedure leads to a more reasonable and interpretable result of the functional enrichment analysis. As a novel term combination-based functional enrichment analysis method, NetGen is complementary to current individual term-based methods, and can help to explore the underlying pathogenesis of complex diseases.

Availability – The proposed method has been implemented in the R package CopTea publicly available at GitHub website,

Sun D, Liu Y, Zhang XS, Wu LY. (2017) NetGen: a novel network-based probabilistic generative model for gene set functional enrichment analysis. BMC Syst Biol 11(Suppl 4):75. [article]

Leave a Reply

Your email address will not be published. Required fields are marked *


Time limit is exhausted. Please reload CAPTCHA.