Co-expression networks have been a useful tool for functional genomics, providing important clues about the cellular and biochemical mechanisms that are active in normal and disease processes. However, co-expression analysis is often treated as a black box with results being hard to trace to their basis in the data. Here, researchers from the Cold Spring Harbor Laboratory use both published and novel single-cell RNA sequencing (RNA-seq) data to understand fundamental drivers of gene-gene connectivity and replicability in co-expression networks.
The researchers performed the first major analysis of single-cell co-expression, sampling from 31 individual studies. Using neighbor voting in cross-validation, they found that single-cell network connectivity is less likely to overlap with known functions than co-expression derived from bulk data, with functional variation within cell types strongly resembling that also occurring across cell types. To identify features and analysis practices that contribute to this connectivity, they performed their own single-cell RNA-seq experiment of 126 cortical interneurons in an experimental design targeted to co-expression. By assessing network replicability, semantic similarity and overall functional connectivity, they identified technical factors influencing co-expression and suggest how they can be controlled for. Many of the technical effects they identify are expression-level dependent, making expression level itself highly predictive of network topology. Finally, the researchers show this occurs generally through re-analysis of the BrainSpan RNA-seq data.
What lies beneath: co-expression can reflect different combinations
of cell-state or compositional variation
Each panel shows a different scenario in which cell state and composition affect the expression of two genes (A and B), yielding different types of co-expression. Two cell types are colored in red and blue. In the top panel, both cell types have state-dependent variation that causes co-expression within each (r ~ 0.75). In addition, there is co-expression due to compositional variation (r ~ 0.75). In the bottom left panel only compositional variation is apparent (r ~ 0.65), there is no relationship between gene A and gene B within the cell types (r ~ 0). This is the opposite in the bottom right panel. Here, there is only variation within the cell types (r ~ 0.95) but no compositional effect across cell types (r ~ 0). The exact value the compositional correlations take would vary in real data since combinations of the underlying cell types would fill in intermediate points, but the three cases would still occur as described; other possibilities due to noise or other complex scenarios (e.g. Yule-Simpson effect) are also possible
Technical properties of single-cell RNA-seq data create confounds in co-expression networks which can be identified and explicitly controlled for in any supervised analysis. This is useful both in improving co-expression performance and in characterizing single-cell data in generally applicable terms, permitting cross-laboratory comparison within a common framework.