Next-generation sequencing allows researchers to efficiently determine the sequences of hundreds of millions of short DNA fragments from an experiment. Many experiments use next-generation sequencing to count nucleic acid molecules in a population by sequencing small fragments of them and assigning them to different genomic features. To find the origins of those fragments, the corresponding sequences are aligned to the genome; these alignments can then be used in downstream analyses. However, this alignment process is complicated by the fact that the genome has many highly similar and repetitive sequences, making it difficult or impossible to unambiguously assign some sequences to a single genomic location. The common “solution” to this problem is to discard those sequencing reads that do not align to a single site; however, this can lead to significant biases and will hide an important part of the genome. To address this problem, researchers from the University of Chicago have developed SmartMap, which serves to process and appropriately weight the alignments of reads that map to more than one genomic location. This enables us to examine many genomic regions that were previously “invisible” to analysis and helps us draw new insights into the regulation and function of repetitive elements of the genome.
Summary of the SmartMap analysis workflow and algorithm
(A) Flowchart outlining the workflow for traditional ChIP-seq (or ICeChIP-seq) analysis utilizing only unireads (left, green) vs. the workflow for SmartMap analysis utilizing multireads with an iterative Bayesian reweighting algorithm (right, blue). (B) Schematic showing the Bayesian reweighting algorithm utilized in the SmartMap analysis. Each mapping associated with a read is assigned a weight such that the weight is greater for those mappings associated with loci of greater map weight density.
Availability – Software written as part of this work are available at https://github.com/shah-rohan/SmartMap for download