Guidance for RNA-seq co-expression network construction and analysis

RNA-seq co-expression analysis is in its infancy and reasonable practices remain poorly defined. Researchers at Cold Spring Harbor Laboratory assessed a variety of RNA-seq expression data to determine factors affecting functional connectivity and topology in co-expression networks.

The researchers examined RNA-seq co-expression data generated from 1,970 RNA-seq samples using a guilt-by-association framework, in which genes are assessed for the tendency of co-expression to reflect shared function. Minimal experimental criteria to obtain performance on par with microarrays were >20 samples with read depth >10M per sample. While the aggregate network constructed shows good performance (AUROC~0.71), the dependency on number of experiments used is nearly identical to that present in microarrays, suggesting thousands of samples to obtain ‘gold-standard‘ co-expression. The researchers found a major topological difference between RNA-seq and microarray co-expression in the form of low overlaps between hub-like genes from each network due to changes in the correlation of expression noise within each technology.

rna-seq

Availability – Networks are available at: http://gillislab.labsites.cshl.edu/supplements/rna-seq-networks/

Contact: jgillis@cshl.edu

Ballouz S, Verleyen W, Gillis J. (2015) Guidance for RNA-seq co-expression network construction and analysis: safety in numbers. Bioinformatics [Epub ahead of print]. [abstract]

2 comments

  1. This paper overlooked two clearly identifiable culprits of differences in co-expression between arrays and RNA-seq and attributed them to “noise”:

    1) the manner by which a gene in quantified is fundamentally different between RNA-seq studies and between RNA-seq and arrays. Union gene models, probabilistic FPKM estimation, array-based (semi)quantification of of specific isoforms with probes

    2) RNA-seq is more severely affected by RNA degradation. Many microarray-based platforms tag the 3′ end of transcripts and reduce – but do not remove – the effects of RNA degradation. RNA-seq often uses the whole transcript, and poly(A) selected RNA-seq is particularly sensitive to RNA degradation. This results in co-expression modules and module hubs that are highly related to – in effect, driven by – RNA quality

  2. Hi Neel,

    Were either of those responsible, I’d probably call them ‘noise’. We basically define ‘noisy’ transcripts as those which needed to be filtered off in the SEQC technical replicability experiments to get replicable results when different labs measured the same samples. The fact that the lack of consistent measurement across labs is caused by underlying technological properties (or failures) doesn’t make them not ‘noise’ for us. On your points:

    1) This is not too relevant. It is trivial to observe the method of measurement is not identical. We’re still hoping, when using RNA-seq and microarrays to be measuring something vaguely related. We observed where the breakdown occurs and how it maps to an important property in co-expression. Because methods vary, it is always possible to note that this might drive differences or that some subset of methods might correct various problems, hypothetically. That’s just special pleading. Having sampled pretty widely from popular methods the effect we described was pretty robust.

    2) Our initial speculation ran in this direction (and there’s a variety of related experiments in the supplement), but it doesn’t appear to be true; or, at least, it isn’t true when appropriate quality control is applied to the data. Instead, it was the microarray data in which correlated ‘noisy’ transcripts created hubs. Many folks will likely assume cross-hybridization issues in microarrays create these correlations in the ‘noisy’ set and this seems a reasonable default.

    Anyway, a quibble on tone aside, I do appreciate your interests. I think there’s a bit of a disconnect about what is signal and what is noise. This is always a difficult judgment and it is precisely for that reason we were happy to outsource the criterion we used to the MAQC experiments; I think our use of the word ‘noisy’ to describe the transcripts they had to remove (for good reasons) is not out of line.

    To put our finding a bit more simply: There are a lot of noisy transcripts in RNA-seq but they have nice properties which lets you remove them. They’re harder to pick out in microarray and are often well correlated.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Time limit is exhausted. Please reload CAPTCHA.