Because of ever-increasing throughput requirements of sequencing data, most existing short-read aligners have been designed to focus on speed at the expense of accuracy.
Now, researchers at the Centro Nacional de Análisis Genómico, Spain have developed the Genome Multitool (GEM) mapper, which can leverage string matching by filtration to search the alignment space more efficiently, simultaneously delivering precision (performing fully tunable exhaustive searches that return all existing matches, including gapped ones) and speed (being several times faster than comparable state-of-the-art tools).
Unlike most other mappers, GEM adopts a filtration-based approach to approximate string matching: all relevant candidate matches are extracted from a Ferragina-Manzini index by suitable pigeonhole-like rules and refined by dynamic programming in bit-compressed representation. This strategy to prune the search space without missing matches, primed with careful optimizations, confers several advantages to the GEM method.
- First, regardless of alignment parameters, the mapper always performs complete searches: they respect interstrata boundaries and exhaustively find all matches that exist within the search space.
- Second, the speed of GEM is comparable to or faster than that of several currently used state-of-the-art aligners in addition, filtering-based pruning scales well to the range of longer reads targeted by the latest sequencers.
- Third, because of the flexibility of our algorithmic setup, we implemented an innovative versatile design that allows the user to accurately specify complex alignment models tuned to a specific biological problem.
The GEM programs are free for academic noncommercial use and can be downloaded from http://gemlibrary.sourceforge.net.
- Marco-Sola S, Sammeth M, Guigó R, Ribeca P. (2012) The GEM mapper: fast, accurate and versatile alignment by filtration. Nat Methods [Epub ahead of print]. [abstract]