Tests for differential gene expression with RNA-seq data have a tendency to identify certain types of transcripts as significant, e.g. longer and highly-expressed transcripts. This tendency has been shown to bias gene set enrichment (GSE) testing, which is used to find over- or under-represented biological functions in the data. Yet there remains a surprising lack of tools for GSE testing specific for RNA-seq.
Researchers at the University of Michigan a new GSE method for RNA-seq data, RNA-Enrich, that accounts for the above tendency empirically by adjusting for average read count per gene. RNA-Enrich is a quick, flexible method and web-based tool, with 16 available gene annotation databases. It does not require a p-value cut-off to define differential expression, and works well even with small sample sized experiments. They show that adjusting for read counts per gene improves both the type I error rate and detection power of the test.
(a) RNA-seq data from LNCaP cells treated with DHT compared to a control showed a relationship between average gene read count and – log10(p-values) from DE tests. (b-c) Histogram of permutation p-values (teal color) should be uniformly distributed for acceptable type I error rate. For RNA-Enrich, the type I error rate is approximately uniform (b), but for the random sets approach for which there is no correction, more p-values are significant than expected (c). With the original data, RNA-Enrich identifies more significant GO terms than the random sets method (pink color). (d) RNA-seq data from A549 cells treated with Dex compared to ethanol showed no relationship between read count and –log10(p-values). (e-f) With or without the read count bias correction, type I error rate is approximately uniform, indicating that no correction is needed and either test is valid. Additional permutations, enrichment testing methods, and dataset are provided in the supplement.
Availability – RNA-Enrich is available at: http://lrpath.ncibi.org or from supplemental material as R code.