RNA sequencing (RNA-Seq) measures genome-wide gene expression. RNA-Seq data is count-based rendering normal distribution models for analysis inappropriate. Normalization of RNA-Seq data to transform the data has limitations which can adversely impact the analysis. Furthermore, there are a few count-based methods for analysis of RNA-Seq data but they are essentially for pairwise analysis of treatment groups or multiclasses but not pattern-based to identify co-expressed genes.
Researchers from the National Institute of Environmental Health Sciences adapted their extracting patterns and identifying genes methodology for RNA-Seq (EPIG-Seq) count data. The software uses count-based correlation to measure similarity between genes, quasi-Poisson modelling to estimate dispersion in the data and a location parameter to indicate magnitude of differential expression.
EPIG-Seq is different than any other software currently available for pattern analysis of RNA-Seq data in that EPIG-Seq:
- uses count level data and supports cases of inflated zeros,
- identifies statistically significant clusters of genes that are co-expressed across experimental conditions,
- takes into account dispersion in the replicate data and
- provides reliable results even with small sample sizes.
EPIG-Seq operates in two steps: 1) extract the pattern profiles from data as seeds for clustering co-expressed genes and 2) cluster the genes to the pattern seeds and compute statistical significance of the pattern of co-expressed genes.
The EPIG-Seq GUI contains a main panel which allows users to define parameters for steps 1 and 2 of the analysis process. A dialog box displays the processing status and a command window displays the dependent processes running in the background
EPIG-Seq provides a table of the genes with bootstrapped p-values and profile plots of the patterns of co-expressed genes. In addition, EPIG-Seq provides a heat map and principal component dimension reduction plot of the clustered genes as visual aids. The developers demonstrate the utility of EPIG-Seq through the analysis of toxicogenomics and cancer data sets to identify biologically relevant co-expressed genes.
EPIG-Seq is unlike any other software currently available for pattern analysis of RNA-Seq count level data across experimental groups. Using the EPIG-Seq software to analyze RNA-Seq count data across biological conditions permits the ability to extract biologically meaningful co-expressed genes associated with coordinated regulation.
Availability – EPIG-Seq is available at: sourceforge.net/projects/epig-seq