Researchers from the University of Toulouse propose a method to increase the reliability of the inference when RNA-seq expression data have been measured together with an auxiliary dataset that can provide external information on gene expression similarity between samples. Their statistical approach, hd-MI, is based on imputation for samples without available RNA-seq data that are considered as missing data but are observed on the secondary dataset. hd-MI can improve the reliability of the inference for missing rates up to 30% and provides more stable networks with a smaller number of false positive edges. On a biological point of view, hd-MI was also found relevant to infer networks from RNA-seq data acquired in adipose tissue during a nutritional intervention in obese individuals. In these networks, novel links between genes were highlighted, as well as an improved comparability between the two steps of the nutritional intervention.

**Overview of hd-MI**

*The original dataset (̃X, left) is duplicated Mtimes (second column). For every duplicate, each missing row is imputed by hot-deck (third column, X ^{∗,m}). A network is inferred from each imputed dataset (fourth column), with LLGM (StARS is used to choose the regularization parameter, λ, in the method). Finally the networks are combined into a single network using a threshold r_{0} for edge frequency among the Mnetworks (fifth column).*

**Availability** – Software and sample data are available as an R package, RNAseqNet, that can be downloaded from the Comprehensive R Archive Network (CRAN): https://cran.r-project.org/web/packages/RNAseqNet/index.html

Imbert A, Valsesia A, Le Gall C, Armenise C, Lefebvre G, Gourraud PA, Viguerie N, Villa-Vialaneix N. (2017) **Multiple hot-deck imputation for network inference from RNA sequencing data.** *Bioinformatics* [Epub ahead of print]. [abstract]

**The p53-MDM2 Boolean gene regulatory network **

Th*e state of the system at time k is represented by a vector (ATM _{ k },p53 _{ k },WIP1 _{ k },MDM2 _{ k }), where the subscript k indicates expression state at time k. The Boolean input u _{ k }=dna_dsb _{ k } at time k indicates the presence of DNA double strand breaks. Counter-clockwise from the top right: the activation/inhibition pathway diagram, transition diagrams corresponding to a constant inputs dna_dsb _{ k }≡0 (no stress) and dna_dsb _{ k }≡1 (DNA damage), and Boolean equations that describe the state transitions*

Mcclenny LD, Imani M, Braga-Neto UM. (2017) **BoolFilter: an R package for estimation and identification of partially-observed Boolean dynamical systems**. *BMC Bioinformatics* 18:519. [article]

**Graph-based clustering using a decoupling strategy for hard-clustering**

*The principle is similar for soft-clustering. Gray nodes represent TFs nodes. The T-label problem is decomposed into T binary sub-problems by setting the component t of marker labels s(t); t 2 T, to one and the others to zero. Each sub-problem t leads to a probability for each node. The final node clustering corresponds to the label whose probability amidst the T sub-problems is maximal.*

**Availability** – BRANE Clust software is available at: http://www-syscom.univ-mlv.fr/~pirayre/Codes-GRN-BRANE-clust.html

Pirayre A, Couprie C, Duval L, Pesquet JC. (2017) **BRANE Clust: Cluster-Assisted Gene Regulatory Network Inference Refinement.** *IEEE/ACM Trans Comput Biol Bioinform* [Epub ahead of print]. [abstract]

As the underlying structure of many networks is not (completely) known, one focus of systems biology is uncovering the complex and dynamic interactions between genes. The research area called ‘network inference (NI)’ aims at the deduction of network structures utilizing high-throughput data with help of reverse engineering techniques. In most cases transcriptome data is used. NI consists of three parts:

- the identification of potential regulators,
- the prediction of target genes and
- the inference of the mode of interaction (e.g. activation or repression).

The advance of Next-Generation-Sequencing of cDNAs derived from RNA samples (RNA-Seq) allows to study transcriptomes with a so far unreachable depth and quality. On the other hand, data pre-processing poses new challenges. Here, the authors describe a work-flow combining RNA-Seq data analysis with NI. In particular, the advance of RNA-Seq allows researchers to perform transcriptome studies of interacting (micro-) organisms using the same technology without having to separate RNA samples (‘dual RNA-Seq’). This allows to predict GRNs of organisms which interact with each other.

**Workflow of GRN inference **

*Systems Biology Cycle of wet lab (experiment) and dry lab work: Experiments lead to RNA-Seq data, which need to be preprocessed and features have to be selected (more detailed steps are shown in grey boxes). A GRN is inferred for selected features. Predicted interactions are validated leading to more knowledge and new hypotheses. Both analysis of experimental data (data preprocessing and feature selection) and modeling (network inference) is supported by prior knowledge.*

Linde J, Schulze S, Henkel SG, Guthke R. (2016) **Data- and knowledge-based modeling of gene regulatory networks: an update**. *EXCLI J* 14:346-78. [article]

Weighting all possible pairwise gene relationships by a probability of edge presence, researchers from IFP Energies Nouvelles, France formulate the regulatory network inference as a discrete variational problem on graphs. They enforce biologically plausible coupling between groups and types of genes by minimizing an edge labeling functional coding for a priori structures. The optimization is carried out with Graph cuts, an approach popular in image processing and computer vision. The researchers compare the inferred regulatory networks to results achieved by the mutual-information-based Context Likelihood of Relatedness (CLR) method and by the state-of-the-art GENIE3, winner of the DREAM4 multifactorial challenge.

*Schematic view of the proposed BRANE Cut method. The initial graph ( a) is transformed into an intermediate graph (b) in which a max-flow computation is performed to return an optimal edge labeling x ^{∗} leading to the inferred graph (c). We choose to present the method in its full generality with unscaled weights (i.e. w _{i,j}∈ [ 0,+∞[, and λ parameters also belong to [ 0,+∞[. Nodes v _{2} and v _{3} are TFs, λTF¯=1 and λ _{TF}=3. Taking γ=4 implies that v _{1}, v _{2}, and v _{3} satisfy the regulator coupling property. Vertices v _{1} and v _{4} are thus affected, leading to the presence of additional edges weighted by ρ _{1,2,3}=0 and ρ _{4,2,3}=3, when μ is set to 3. Computing a max-flow in the graph (b) leads to some edge saturation, represented in dashed lines. The values from the source (value 1) and the sink (value 0) are propagated through non saturated paths, thus leading to x _{2,4}=x _{3,4}=0*

TheBRANE Cut approach infers more accurately the five DREAM4 in silico networks (with improvements from 6 % to 11 %). On a real Escherichia coli compendium, an improvement of 11.8 % compared to CLR and 3 % compared to GENIE3 is obtained in terms of Area Under Precision-Recall curve. Up to 48 additional verified interactions are obtained over GENIE3 for a given precision. On this dataset involving 4345 genes, our method achieves a performance similar to that of GENIE3, while being more than seven times faster.

BRANE Cut is a weighted graph thresholding method. Using biologically sound penalties and data-driven parameters, it improves three state-of-the art GRN inference methods. It is applicable as a generic network inference post-processing, due to its computational efficiency.

**Availability** – The BRANE Cut code is available at: http://www-syscom.univ-mlv.fr/~pirayre/Codes-GRN-BRANE-cut.html

Pirayre A, Couprie C, Bidard F, Duval L, Pesquet JC. (2015) **BRANE Cut: biologically-related a priori network enhancement with graph cuts for gene regulatory network inference**. *BMC Bioinformatics* 16:369. [article]

- First, besides an environmental change, the battle between pathogen and host leads to a constantly changing environment and thus complex gene expression patterns.
- Second, there might be a delay until one of the organisms reacts.
- Third, toward later time points only one organism may survive leading to missing gene expression data of the other organism.

Here, Researchers at the Hans-Knoell-Institute account for PHI characteristics by extending NetGenerator, a network inference tool that predicts gene regulatory networks from gene expression time series data. They tested multiple modeling scenarios regarding the stimuli functions of the interaction network based on a benchmark example. They show that modeling perturbation of a PHI network by multiple stimuli better represents the underlying biological phenomena. Furthermore, the researchers utilized the benchmark example to test the influence of missing data points on the inference performance. Their results suggest that PHI network inference with missing data is possible, but we recommend to provide complete time series data. Finally, they extended the NetGenerator tool to incorporate gene- and time point specific variances, because complex PHIs may lead to high variance in expression data. Sample variances are directly considered in the objective function of NetGenerator and indirectly by testing the robustness of interactions based on variance dependent disturbance of gene expression values. The researchers evaluated the method of variance incorporation on dual RNA sequencing (RNA-Seq) data of Mus musculus dendritic cells incubated with Candida albicans and proofed our method by predicting previously verified PHIs as robust interactions.

**Availability** – The extended NetGenerator 2.3.-0 tool is available at http://www.biocontrol-jena.com/NetGenerator/NetGenerator_2.3-0.tar.gz.

Schulze S, Henkel SG, Driesch D, Guthke R, Linde J. (2015) **Computational prediction of molecular pathogen-host interactions based on dual transcriptome data**. *Front Microbiol* 6:65. [article]