Given that the majority of multi-exon genes generate diverse functional products, it is important to evaluate expression at the isoform level. Previous studies have demonstrated strong gene-level correlations between RNA sequencing (RNA-seq) and microarray platforms, but have not studied their concordance at the isoform level.
Researchers at Northwestern University performed transcript abundance estimation on raw RNA-seq and exon-array expression profiles available for common glioblastoma multiforme samples from The Cancer Genome Atlas using different analysis pipelines, and compared both the isoform- and gene-level expression estimates between programs and platforms.
The results showed better concordance between RNA-seq/exon-array and reverse transcription-quantitative polymerase chain reaction (RT-qPCR) platforms for fold change estimates than for raw abundance estimates, suggesting that fold change normalization against a control is an important step for integrating expression data across platforms. Based on RT-qPCR validations, eXpress and Multi-Mapping Bayesian Gene eXpression (MMBGX) programs achieved the best performance for RNA-seq and exon-array platforms, respectively, for deriving the isoform-level fold change values. While eXpress achieved the highest correlation with the RT-qPCR and exon-array (MMBGX) results overall, RSEM was more highly correlated with MMBGX for the subset of transcripts that are highly variable across the samples. eXpress appears to be most successful in discriminating lowly expressed transcripts, but IsoformEx and RSEM correlate more strongly with MMBGX for highly expressed transcripts. The results also reinforce how potentially important isoform-level expression changes can be masked by gene-level estimates, and demonstrate that exon arrays yield comparable results to RNA-seq for evaluating isoform-level expression changes.
Spearman correlation coefficients between MMBGX and different RNA-seq quantification methods
(A) Box plots summarize the distribution of individual sample correlations with MMBGX estimates according to each RNA-seq tool tested. Median correlation values are shown (n = 102). For each method, correlations were calculated for both raw expression values and fold change values relative to the normal-tissue samples. (B) Average number of commonly resolved isoforms between MMBGX and each RNA-seq method. MMBGX-only transcripts (yellow) included only if in top 50% of transcripts. (C) Correlations between MMBGX and each RNA-seq method for relatively highly expressed (75–100%) and lowly expressed (0–25%) isoforms.
- Gene expression studies should strive to evaluate expression at the isoform level or risk masking important expression dynamics.
- Exon-array expression analysis can yield comparable results to RNA-seq pipelines for evaluating isoform-level expression changes, suggesting that integrating isoform expression data across platforms and pipelines may improve the reliability of expression estimates. Accurate integration of isoform-level expression data across platforms and pipelines will, however, depend on the availability of normal (tissue or organ specific) control samples within each platform and tumor type.
- While eXpress achieved the highest correlation with the RT-qPCR and exon-array (MMBGX) results overall, RSEM was more highly correlated with MMBGX for changes in highly variable transcripts. eXpress appears to be most successful in discriminating lowly expressed transcripts, but IsoformEx and RSEM correlate more strongly with MMBGX for highly expressed transcripts.
- Isoform fold change values consistently demonstrate stronger agreement across platforms than raw expression estimates, suggesting that fold change normalization against a control is an important step for integrating expression data across platforms. Further, fold change correlations with RT-qPCR were consistent across different levels of RT-qPCR expression for both RNA-seq and exon-array platforms.
- eXpress demonstrated the highest overall concordance with exon-array and RT-qPCR estimates, but the preference for one RNA-seq quantification algorithm should depend on individual study parameters.