RNA-sequencing (RNA-seq) is a powerful technique for the identification of genetic variants that affect gene-expression levels, either through expression quantitative trait locus (eQTL) mapping or through allele-specific expression (ASE) analysis. Given increasing numbers of RNA-seq samples in the public domain, researchers from the University of Groningen studied to what extent eQTLs and ASE effects can be identified when using public RNA-seq data while deriving the genotypes from the RNA-sequencing reads themselves.
The researchers downloaded the raw reads for all available human RNA-seq datasets. Using these reads they performed gene expression quantification. All samples were jointly normalized and subjected to a strict quality control. They also derived genotypes using the RNA-seq reads and used imputation to infer non-coding variants. This allowed them to perform eQTL mapping and ASE analyses jointly on all samples that passed quality control. The results were validated using samples for which DNA-seq genotypes were available.
4,978 public human RNA-seq runs, representing many different tissues and cell-types, passed quality control.