Gene expression profiling can uncover biologic mechanisms underlying disease and is important in drug development. RNA sequencing (RNA-seq) is routinely used to assess gene expression, but costs remain high. Sample multiplexing reduces RNAseq costs; however, multiplexed samples have lower cDNA sequencing depth, which can hinder accurate differential gene expression detection. The impact of sequencing depth alteration on RNA-seq-based downstream analyses such as gene expression connectivity mapping is not known, where this method is used to identify potential therapeutic compounds for repurposing.
In this study, researchers from Queen’s University Belfast and Johns Hopkins University assembled published RNA-seq profiles from patients with brain tumor (glioma) into two disease progression gene signature contrasts for astrocytoma. Available treatments for glioma have limited effectiveness, rendering this a disease of poor clinical outcome. Gene signatures were subsampled to simulate sequencing alterations and analyzed in connectivity mapping to investigate target compound robustness.
Data loss to gene signatures led to the loss, gain, and consistent identification of significant connections. The most accurate gene signature contrast with consistent patient gene expression profiles was more resilient to data loss and identified robust target compounds. Target compounds lost included candidate compounds of potential clinical utility in glioma (eg, suramin, dasatinib). Lost connections may have been linked to low-abundance genes in the gene signature that closely characterized the disease phenotype. Consistently identified connections may have been related to highly expressed abundant genes that were ever-present in gene signatures, despite data reductions. Potential noise surrounding findings included false-positive connections that were gained as a result of gene signature modification with data loss.
Effect of decreased cDNA library sequencing depth on the
number of differentially expressed genes (DEGs) detected
Effect of decreased cDNA library sequencing depth on the number of differentially expressed genes (DEGs) detected from (A) Dataset_I, and (B) Dataset_II gene signatures. Visualization of the global stratification ability of (C) Dataset_I, low to high (L-H), and (D) Dataset_II, high to high (H-H) gene signatures. Dataset_I is composed of astrocytomas (ASTRO) and anaplastic astrocytomas (aASTRO). Dataset_ II is composed of aASTRO and secondary glioblastomas (sGBM). Heatmap was generated using unsupervised hierarchical clustering with the full RNA-seq data (f = 1) and depicts the gene expressional patterns of the top 100 differentially expressed genes identified between the gene signature contrast groups. The WHO disease grades of samples as determined by Bao et al are overlaid.
Findings highlight the necessity for gene signature accuracy for connectivity mapping, which should improve the clinical utility of future target compound discoveries.