EPIG-Seq – extracting patterns and identifying co-expressed genes from RNA-Seq data

RNA sequencing (RNA-Seq) measures genome-wide gene expression. RNA-Seq data is count-based rendering normal distribution models for analysis inappropriate. Normalization of RNA-Seq data to transform the data has limitations which can adversely impact the analysis. Furthermore, there are a few count-based methods for analysis of RNA-Seq data but they are essentially for pairwise analysis of treatment groups or multiclasses but not pattern-based to identify co-expressed genes.

Researchers from the National Institute of Environmental Health Sciences adapted their extracting patterns and identifying genes methodology for RNA-Seq (EPIG-Seq) count data. The software uses count-based correlation to measure similarity between genes, quasi-Poisson modelling to estimate dispersion in the data and a location parameter to indicate magnitude of differential expression.

EPIG-Seq is different than any other software currently available for pattern analysis of RNA-Seq data in that EPIG-Seq:

  1. uses count level data and supports cases of inflated zeros,
  2. identifies statistically significant clusters of genes that are co-expressed across experimental conditions,
  3. takes into account dispersion in the replicate data and
  4. provides reliable results even with small sample sizes.

EPIG-Seq operates in two steps: 1) extract the pattern profiles from data as seeds for clustering co-expressed genes and 2) cluster the genes to the pattern seeds and compute statistical significance of the pattern of co-expressed genes.



The EPIG-Seq GUI contains a main panel which allows users to define parameters for steps 1 and 2 of the analysis process. A dialog box displays the processing status and a command window displays the dependent processes running in the background

EPIG-Seq provides a table of the genes with bootstrapped p-values and profile plots of the patterns of co-expressed genes. In addition, EPIG-Seq provides a heat map and principal component dimension reduction plot of the clustered genes as visual aids. The developers demonstrate the utility of EPIG-Seq through the analysis of toxicogenomics and cancer data sets to identify biologically relevant co-expressed genes.

EPIG-Seq is unlike any other software currently available for pattern analysis of RNA-Seq count level data across experimental groups. Using the EPIG-Seq software to analyze RNA-Seq count data across biological conditions permits the ability to extract biologically meaningful co-expressed genes associated with coordinated regulation.

Availability – EPIG-Seq is available at: sourceforge.net/projects/epig-seq

Li J, Bushel PR. (2016) EPIG-Seq: extracting patterns and identifying co-expressed genes from RNA-Seq data. BMC Genomics 17(1):255. [article]


Leave a Reply

Your email address will not be published. Required fields are marked *


Time limit is exhausted. Please reload CAPTCHA.