Understanding the normal state of human tissue transcriptome profiles is essential for recognizing tissue disease states and identifying disease markers. Recently, the Human Protein Atlas and the FANTOM5 consortium have each published extensive transcriptome data for human samples using Illumina-sequenced RNA-Seq and Heliscope-sequenced CAGE.
Here, a team led by researchers at the Karolinska Institute, report on the first large-scale complex tissue transcriptome comparison between full-length versus 5′-capped mRNA sequencing data. Overall gene expression correlation was high between the 22 corresponding tissues analyzed (R > 0.8). For genes ubiquitously expressed across all tissues, the two data sets showed high genome-wide correlation (91% agreement), with differences observed for a small number of individual genes indicating the need to update their gene models. Among the identified single-tissue enriched genes, up to 75% showed consensus of 7-fold enrichment in the same tissue in both methods, while another 17% exhibited multiple tissue enrichment and/or high expression variety in the other data set, likely dependent on the cell type proportions included in each tissue sample.
Comparison of overall correlation values between 22 tissue samples chosen from the FANTOM5 and HPA data sets. (A) The dotplot shows the ranges of correlation values between each of the 27 tissue samples in FANTOM5 data set against all of the 75HPA tissue samples (brain, colon, heart, lung, and testis each has two samples coming from the same tissue). (B) Hierarchical clustering shows tissue relationships within the 27 FANTOM5 samples. The heatmap shows subtle differences in the correlation relationship of HPA tissue samples to FANTOM5 tissues samples. All correlation scores were calculated as pair-wise Spearman correlation coefficients between the tissue samples.
These results show that RNA-Seq and CAGE tissue transcriptome data sets are highly complementary for improving gene model annotations and highlight biological complexities within tissue transcriptomes. Furthermore, integration with image-based protein expression data is highly advantageous for understanding expression specificities for many genes.