Deregulated pathways identified from transcriptome data of two sample groups have played a key role in many genomic studies. Gene-set enrichment analysis (GSEA) has been commonly used for pathway or functional analysis of microarray data, and it is also being applied to RNA-seq data. However, most RNA-seq data so far have only small replicates. This enforces to apply the gene-permuting GSEA method (or preranked GSEA) which results in a great number of false positives due to the inter-gene correlation in each gene-set.
Researchers from the Ulsan National Institute of Science and Technology, Korea demonstrate that incorporating the absolute gene statistic in one-tailed GSEA considerably improves the false-positive control and the overall discriminatory ability of the gene-permuting GSEA methods for RNA-seq data. To test the performance, a simulation method to generate correlated read counts within a gene-set was newly developed, and a dozen of currently available RNA-seq enrichment analysis methods were compared, where the proposed methods outperformed others that do not account for the inter-gene correlation. Analysis of real RNA-seq data also supported the proposed methods in terms of false positive control, ranks of true positives and biological relevance.
Performance comparison of gene-permuting GSEA methods for simulated read counts
GSEA-GP methods combined with eight gene statistics, (moderated t-statistic, SNR, Ranksum, logFC and their absolute versions), Camera combined with voom normalization, RNA-Enrich and two preranked GSEA methods for edgeR p-values and FCs were compared for false positive rate, true positive rate and area under the receiver operating curve using simulated read count data with three (A-C) and five replicates (D-F).
Availability – An efficient R package (AbsFilterGSEA) coded with C++ (Rcpp) is available from CRAN: https://cran.r-project.org/web/packages/AbsFilterGSEA/index.html