Allele-specific transcriptional regulation, including of imprinted genes, is essential for normal mammalian development. While the regulatory regions controlling imprinted genes are associated with DNA methylation (DNAme) and specific histone modifications, the interplay between transcription and these epigenetic marks at allelic resolution is typically not investigated genome-wide due to a lack of bioinformatic packages that can process and integrate multiple epigenomic datasets with allelic resolution. In addition, existing ad-hoc software only consider SNVs for allele-specific read discovery. This limitation omits potentially informative INDELs, which constitute about one fifth of the number of SNVs in mice, and introduces a systematic reference bias in allele-specific analyses.
Here, researchers from the University of British Columbia describe MEA, an INDEL-aware Methylomic and Epigenomic Allele-specific analysis pipeline which enables user-friendly data exploration, visualization and interpretation of allelic imbalance. Applying MEA to mouse embryonic datasets yields robust allele-specific DNAme maps and low reference bias. The researchers validate allele-specific DNAme at known differentially methylated regions and show that automated integration of such methylation data with RNA- and ChIP-seq datasets yields an intuitive, multidimensional view of allelic gene regulation. MEA uncovers numerous novel dynamically methylated loci, highlighting the sensitivity of the pipeline. Furthermore, processing and visualization of epigenomic datasets from human brain reveals the expected allele-specific enrichment of H3K27ac and DNAme at imprinted as well as novel monoallelically expressed genes, highlighting MEA’s utility for integrating human datasets of distinct provenance for genome-wide analysis of allelic phenomena.
A bioinformatics toolkit for allele-specific epigenomic analysis
a MEA pipeline flow chart. Supplied with a reference genome assembly and relevant genetic variants, MEA first reconstructs a diploid pseudogenome. Subsequently, allele-specific analysis is performed on the input gene expression (RNA-seq), histone PTM (ChIP-seq) or DNAme (WGBS) data in FASTQ format. MEA calculates allelic imbalance values using the resulting allele-specific genomic coverage files and generates a tab-delimited table for the user-defined regions of interest. Mouse and human exon, gene body and transcription start site coordinates are provided to facilitate analyses of such regions. b Venn diagram showing the theoretical number of CpG dinucleotides for which allele-specific DNAme levels can be calculated using C57BL/6 J and DBA/2 J SNVs (blue) or INDELs (green) alone. CpGs for which allelic information can theoretically be extracted are defined as those that fall within 200 bp (an insert size typical of WGBS libraries) of a genetic variant. c Venn diagram showing the observed number of C57BL/6 J-specific CpG dinucleotides for which allele-specific DNAme levels were calculated using MEA (yellow) versus an INDEL-agnostic contemporary allele-specific DNAme script using the same dataset (red)
This novel pipeline for standardized allele-specific processing and visualization of disparate epigenomic and methylomic datasets enables rapid analysis and navigation with allelic resolution.
Availability – MEA is freely available as a Docker container at https://github.com/julienrichardalbert/MEA .