In addition to detecting novel transcripts and higher dynamic range, a principal claim for RNA-sequencing has been greater replicability, typically measured in sample-sample correlations of gene expression levels.
Through a re-analysis of ENCODE data, researchers at Cold Spring Harbor Laboratory show that replicability of transcript abundances will provide misleading estimates of the replicability of conditional variation in transcript abundances (i.e., most expression experiments). Heuristics which implicitly address this problem have emerged in quality control measures to obtain ‘good’ differential expression results. However, these methods involve strict filters such as discarding low expressing genes or using technical replicates to remove discordant transcripts, and are costly or simply ad hoc.
As an alternative, the researchers model gene-level replicability of differential activity using co-expressing genes. They find that sets of housekeeping interactions provide a sensitive means of estimating the replicability of expression changes, where the co-expressing pair can be regarded as pseudo-replicates of one another. They model the effects of noise that perturbs a gene’s expression within its usual distribution of values and show that perturbing expression by only 5% within that range is readily detectable (AUROC~0.73).
Schematic of the AuPairWise method
Input into the script is an expression matrix. Noise is added to a sample at random, and an AUROC is calculated based on how well the perturbation is detected by the gene-pairs. This is repeated for multiple noise factors, which then allows us to estimate the amount of noise required to significantly disrupt the experiment, which is used as our metric for replicability. The outputs are summary files with the AUROCs and noise estimates, along with the summary plot.
Availability – The developers have made their method available as a set of easily implemented R scripts: https://github.com/sarbal/AuPairWise