Comparing RNA-seq data across studies reveals evolution of gene expression

Phenotypic differences among species are often driven by evolutionary adaptations in gene expression, yet many developmental programs and pathways are deeply conserved. Gene expression among homologous genes across vertebrate species and tissues has been explored using microarray and RNA-sequencing (RNA-seq). All of these studies concluded that gene expression was more similar between homologous organs of different species than between different organs of the same species. This result has been interpreted as a reflection of evolutionarily conserved transcriptional programs driving the production of major proteins that define specific organs, such as heart, lung, or liver. This result supports the accepted idea that non-human vertebrate models, such as rodents, serve as useful models of the physiology of particular human organs, despite tens of millions of years of evolutionary divergence. Recently, however, a study assessing 13 human and mouse tissues challenged this result, concluding that different organs within a species are more similar in gene expression than homologous organs in different species.

To better understand gene expression evolution researchers from MIT reanalyzed data from these studies encompassing 6–13 tissues each from 11 vertebrate species using standardized mapping, normalization, and clustering methods. An analysis of independent data showed that the set of tissues chosen by Lin et al. were more similar to each other than those analyzed by previous studies. Comparing expression in five common tissues from the four studies, the researchers observed that samples clustered exclusively by tissue rather than by species or study, supporting conservation of organ physiology in mammals. Furthermore, inter-study distances between homologous tissues were generally less than intra-study distances among different tissues, enabling informative meta-analyses. Notably, when comparing expression divergence of tissues over time to expression variation across 51 human GTEx tissues, we could accurately predict the clustering of expression for arbitrary pairs of tissues and species.


Clustering by species or tissue is predictably dependent on the subset of tissues selected and the divergence times of the species analyzed. a Inter-tissue distance (JSD ½ ) between Lin2 and GTEx human samples overlaid with line y = x. b The distance (JSD ½ ) between matched tissues among species plotted as a function of evolutionary time for all tissues and species assessed in the Brawand and Merkin studies (n = 43). c Clustering heat map of 51 human tissues sequenced by GTEx. Distances represent the mean inter-tissue distance calculated among three individuals. Colored boxes indicate the flat clusters (groupings) formed for distance cutoffs corresponding to the mean interspecies tissue distance at specific divergence times. Tissues within a cluster have an inter-tissue distance lower than the mean interspecies distance between matched tissues. MYA million years ago

These results provide a framework for the design of future evolutionary studies of gene expression and demonstrate the utility of comparing RNA-seq data across studies.

Sudmant PH, Alexis MS, Burge CB. (2015) Meta-analysis of RNA-seq expression data across species, tissues and studies. Genome Biology 16:287. [article]

Leave a Reply

Your email address will not be published. Required fields are marked *


Time limit is exhausted. Please reload CAPTCHA.