Long intergenic noncoding RNAs (lincRNAs) have risen to prominence in cancer biology as new biomarkers of disease. Those lincRNAs transcribed from active cis-regulatory elements (enhancers) have provided mechanistic insight into cis-acting regulation; however, in the absence of an enhancer hallmark, computational prediction of cis-acting transcription of lincRNAs remains challenging. Here, University of Chicago researchers introduce a novel transcriptomic method: a cis-regulatory lincRNA-gene associating metric, termed ‘CisPi’. CisPi quantifies the mutual information between lincRNAs and local gene expression regarding their response to perturbation, such as disease risk-dependence. To predict risk-dependent lincRNAs in neuroblastoma, an aggressive pediatric cancer, the researchers advance this scoring scheme to measure lincRNAs that represent the minority of reads in RNA-Seq libraries by a novel side-by-side analytical pipeline.
Altered expression of lincRNAs that stratifies tumor risk is an informative readout of oncogenic enhancer activity. The CisPi metric therefore provides a powerful computational model to identify enhancer-templated RNAs (eRNAs), eRNA-like lincRNAs, or active enhancers that regulate the expression of local genes. First, risk-dependent lincRNAs revealed active enhancers, over-represented neuroblastoma susceptibility loci, and uncovered novel clinical biomarkers. Second, the prioritized lincRNAs were significantly prognostic. Third, the predicted target genes further inherited the prognostic significance of these lincRNAs. In sum, RNA-Seq alone is sufficient to identify disease-associated lincRNAs using our methodologies, allowing broader applications to contexts in which enhancer hallmarks are not available or show limited sensitivity.
CisPi metric not only prioritized the phenotype-associated lincRNAs
but predicted target genes of lincRNAs
(a1) Density plot showing the Spearman’s coefficients for five types of lincRNA–gene pairs across 68 samples. Vertical dashed lines are the significance cutoff at r = 0.6 where signatures dispatch from random controls (dotted lines). Risk-dependent adjacent pairs showing the highest proportion of positive correlation. (a2) Hexbin plot presenting the coherent risk-dependence of identified lincRNAs (x-axis) and genes (y-axis). (c) Sparse scatterplot for two types of cisPi-scores, one modeling the adjacent targeting (y-axis) and another modeling all potential targets within a TAD (x-axis). The orange diagonal line indicates y = x with two dashed lines for standard deviations. Annotated lincRNAs are labeled with the names of their nearest ‘target’ gene(s). Horizontal dashed lines indicate a 10% cutoff (cisPi <−10 or >40) for prioritization. (b) Circos plot of the 34 prioritized lincRNAs in the human genome, noted by 14 HR-upregulated genes (in red) and 10 HR-downregulated genes (in blue) sitting adjacent or in a TAD. (c) Ingenuity Pathway Analysis (IPA, www.Ingenuity.com) on these 24 predicted target genes (P < 0.001) showing an enriched disease and function (panel 1) and upstream regulatory molecules (Panel 2)
Availability – The source code is available on request. The prioritized lincRNAs and their target genes are in the Supplementary Material.