Given increasing numbers of RNA-seq samples in the public domain, researchers at the University of Groningen studied to what extent expression quantitative trait loci (eQTLs) and allele-specific expression (ASE) can be identified in public RNA-seq data while also deriving the genotypes from the RNA-seq reads. 4,978 human RNA-seq runs, representing many different tissues and cell-types, passed quality control. Even though this data originated from many different laboratories, samples reflecting the same cell-type clustered together, suggesting that technical biases due to different sequencing protocols were limited. The researchers derived genotypes from the RNA-seq reads and imputed non-coding variants. In a joint analysis on 1,262 samples combined, they identified cis-eQTLs effects for 8,034 unique genes. Additionally, they observed strong ASE effects for 34 rare pathogenic variants, corroborating previously observed effects on the corresponding protein levels. Given the exponential growth of the number of publicly available RNA-seq samples, the researchers expect this approach will become relevant for studying tissue-specific effects of rare pathogenic genetic variants.
Availability – The pipeline and tools are freely available as open source software. The pipelines are implemented in Molgenis compute and can be downloaded at: http://github.com/molgenis/molgenis-pipelines The eQTL/ASE mapping software is publicly available at: http://www.molgenis.org/systemsgenetics/QTL-mapping-pipeline