Network inference provides a global view of the relations existing between gene expression in a given transcriptomic experiment (often only for a restricted list of chosen genes). However, it is still a challenging problem: even if the cost of sequencing techniques has decreased over the last years, the number of samples in a given experiment is still (very) small compared to the number of genes.
Researchers from the University of Toulouse propose a method to increase the reliability of the inference when RNA-seq expression data have been measured together with an auxiliary dataset that can provide external information on gene expression similarity between samples. Their statistical approach, hd-MI, is based on imputation for samples without available RNA-seq data that are considered as missing data but are observed on the secondary dataset. hd-MI can improve the reliability of the inference for missing rates up to 30% and provides more stable networks with a smaller number of false positive edges. On a biological point of view, hd-MI was also found relevant to infer networks from RNA-seq data acquired in adipose tissue during a nutritional intervention in obese individuals. In these networks, novel links between genes were highlighted, as well as an improved comparability between the two steps of the nutritional intervention.
Overview of hd-MI
The original dataset (̃X, left) is duplicated Mtimes (second column). For every duplicate, each missing row is imputed by hot-deck (third column, X∗,m). A network is inferred from each imputed dataset (fourth column), with LLGM (StARS is used to choose the regularization parameter, λ, in the method). Finally the networks are combined into a single network using a threshold r0 for edge frequency among the Mnetworks (fifth column).
Availability – Software and sample data are available as an R package, RNAseqNet, that can be downloaded from the Comprehensive R Archive Network (CRAN): https://cran.r-project.org/web/packages/RNAseqNet/index.html