Despite their apparent diversity, many problems in the analysis of high-throughput sequencing data are merely special cases of two general problems, signal detection and signal estimation. Here researchers at the Genome Institute of Singapore adapt formally optimal solutions from signal processing theory to analyze signals of DNA sequence reads mapped to a genome. They describe DFilter, a detection algorithm that identifies regulatory features in ChIP-seq, DNase-seq and FAIRE-seq data more accurately than assay-specific algorithms. They also describe EFilter, an estimation algorithm that accurately predicts mRNA levels from as few as 1–2 histone profiles (R ~0.9). Notably, the presence of regulatory motifs in promoters correlates more with histone modifications than with mRNA levels, suggesting that histone profiles are more predictive of cis-regulatory mechanisms. They show by applying DFilter and EFilter to embryonic forebrain ChIP-seq data that regulatory protein identification and functional annotation are feasible despite tissue heterogeneity. The mathematical formalism underlying our tools facilitates integrative analysis of data from virtually any sequencing-based functional profile.
Availability – The EFilter and Dfilter tools are available at http://collaborations.gis.a-star.edu.sg/~cmb6/kumarv1/dfilter/
- Kumar V, Muratani M, Rayan NA, Kraus P, Lufkin T, Ng HH, Prabhakar S. (2013) Uniform, optimal signal processing of mapped deep-sequencing data. Nat Biotechnol 31(7), 615-22.[article]