by Jeffrey M. Perkel at The Scientist
September was a monumental month for genome aficionados. The National Human Genome Research Institute (NHGRI)–funded Encyclopedia of DNA Elements (ENCODE) Project released 30 papers in the pages of Nature, Genome Biology, Genome Research, plus another nine in Science, Cell, and the Journal of Biological Chemistry detailing functional features across the human genome. In all, ENCODE researchers performed nearly 1,650 experiments on 147 cell lines assessing transcription, transcription factor binding, chromatin topology, histone modifications, DNA methylation, and more.
The term that encompasses such myriad functional elements is epigenomics, and researchers are now well aware of the importance of such features in development and disease. So much so, in fact, that in 2008, five years after NHGRI launched ENCODE, the NIH funded a second large-scale mapping project. The NIH Roadmap Epigenomics Program had compiled some 61 “complete” epigenomes (genome-wide epigenetic profiles of a variety of cell types) as of May 2012, with more scheduled for inclusion in the project’s upcoming release number 8 of the Human Epigenome Atlas.
There’s a lot researchers can do with these data sets. In an early demonstration, The University of Washington’s John Stamatoyannopoulos, a member of both the ENCODE and Roadmap consortia, and colleagues mined these data to address the puzzling fact that the vast majority of trait- and disease-associated sequence variants (SNPs) identified in genome-wide scans lie outside of any protein-coding sequence. By correlating those variant positions against accessible chromatin regions identified in the two epigenomics projects, Stamatoyannopoulos and his team found these variants often overlap with regulatory elements. They then identified the genes upon which those regulatory elements might act—some located hundreds of thousands of bases away (Science, 337:1190-95, 2012).
Both projects have made their data freely available to the research community, many of whom may want to see what these data sets have to say about their own particular gene, tissue, or pathway of interest. Yet for many researchers, handling, parsing, and visualizing so much information can be intimidating. The ENCODE data set alone weighs in at 15 terabytes.
The best advice, says John Satterlee, a Health Scientist Administrator at the National Institute on Drug Abuse and a co-coordinator of the NIH Roadmap Epigenomics Program, is just to jump in and see what’s there. “It’s not like you’re wasting reagents—this is just an in silico experiment,” he says.
We asked Satterlee and fellow experts to show us how to make use of these visualization tools. Here is what they said…