High-throughput sequencing technologies, including RNA-seq, have made it possible to move beyond gene expression analysis to study transcriptional events including alternative splicing and gene fusions. Furthermore, recent studies in cancer have suggested the importance of identifying transcriptionally altered loci as biomarkers for improved prognosis and therapy. While many statistical methods have been proposed for identifying novel transcriptional events with RNA-seq, nearly all rely on contrasting known classes of samples, such as tumor and normal. Few tools exist for the unsupervised discovery of such events without class labels.
In this paper, researchers at the University of North Carolina, Chapel Hill present SigFuge for identifying genomic loci exhibiting differential transcription patterns across many RNA-seq samples. SigFuge combines clustering with hypothesis testing to identify genes exhibiting alternative splicing, or differences in isoform expression. They apply SigFuge to RNA-seq cohorts of 177 lung and 279 head and neck squamous cell carcinoma samples from the Cancer Genome Atlas, and identify several cases of differential isoform usage including CDKN2A, a tumor suppressor gene known to be inactivated in a majority of lung squamous cell tumors. By not restricting attention to known sample stratifications, SigFuge offers a novel approach to unsupervised screening of genetic loci across RNA-seq cohorts.
Availability – SigFuge is available in R as a package through Bioconductor (http://www.bioconductor.org/packages/devel/bioc/html/SigFuge.html). In addition to methods for clustering and calculating significance, the package includes functions for creating plots of per-base expression (expression plots) corresponding to our analysis approach.