Gene co-expression networks (GCNs) are powerful tools that enable biologists to examine associations between genes during different biological processes. With the advancement of new technologies, such as single-cell RNA sequencing (scRNA-seq), there is a need for developing novel network methods appropriate for new types of data.
A research team from University of Louisville and University of Florida have developed a novel sparse Bayesian factor model to explore the network structure associated with genes in scRNA-seq data. Latent factors impact the gene expression values for each cell and provide flexibility to account for common features of scRNA-seq: high proportions of zero values, increased cell-to-cell variability, and overdispersion due to abnormally large expression counts. From this model, the researchers construct a GCN by analyzing the positive and negative associations of the factors that are shared between each pair of genes.
a Heatmap of the “true” correlation structure in Sim 3 (F=10,N=500). b Heatmap of the estimated correlation structure in Sim 3 by HBFM and F=25 factors. c Heatmap of the “true” correlation structure in Sim 4 (F=15,N=500). d Heatmap of the estimated correlation structure in Sim 4 by HBFM and F=25 factors
Simulation studies demonstrate that this methodology has high power in identifying gene-gene associations while maintaining a nominal false discovery rate. In real data analyses, this model identifies more known and predicted protein-protein interactions than other competing network models.
Availability – The R package for the HBFM model is available at: https://github.com/mnsekula/hbfm