The Geuvadis project aims to bring together the knowledge and resources on medical genome sequencing at a European level and allow researchers to develop and test new hypotheses on the genetic basis of disease; to develop standards in sequencing data processing, storage, submission etc. The analysis of samples from the medical field, using RNA and DNA sequencing will allow the project to set up standards in operating procedures and biological/medical interpretation of sequence data in relation to clinical phenotypes.
In the RNA-sequencing work package of the Geuvadis project (Lappalainen et al. Nature 2013), have combined transcriptome and genome sequencing data by performing mRNA and small RNA sequencing on 465 lymphoblastoid cell line (LCL) samples from 5 populations of the 1000 Genomes Project: the CEPH (CEU), Finns (FIN), British (GBR), Toscani (TSI) and Yoruba (YRI). Of these samples, 423 were part of the 1000 Genomes Phase 1 dataset (Abecasis et al. Nature 2012) with low-coverage whole genome and high-coverage exome sequencing data, and the remaining 42 are part of the later phases of 1000 Genomes with Omni 2.5M SNP array data available at the time of this study; these genotypes were imputed from the array data using Phase 1 as the reference.
The main paper presenting the data set and summarizing the key findings, with a focus on transcriptome variantion and its genetic component has been published in Nature in September 2013 by Lappalainen et al.: Transcriptome and genome sequencing uncovers functional variation in humans. http://dx.doi.org/10.1038/nature12531 (in press) with a companion paper on reproducibility and technical variation in RNA-seq published at the same time in Nature Biotechnology by ‘t Hoen et al..: Reproducibility of high-throughput mRNA and small RNA sequencing across laboratories, http://dx.doi.org/10.1038/nbt.2702 (in press). Additionally, there will be future companion papers on splicing variation (Ferreira et al. submitted) and loss-of-function variation (Rivas et al. in preparation), as well as on other aspects of the data.
The Geuvadis RNA-sequencing data are freely and openly available. The main portal for accessing the data is EBI ArrayExpress (accessions E-GEUV-1, E-GEUV-2, E-GEUV-3). For visualisation of the results we created the Geuvadis Data Browser (www.ebi.ac.uk/Tools/geuvadis-das) where quantifications and QTLs can be viewed, searched, and downloaded.