Stampy – A statistical algorithm for sensitive and fast mapping of Illumina sequence reads

Stampy is a package for the mapping of short reads from illumina sequencing machines onto a reference genome. It’s recommended for most workflows, including those for genomic resequencing, RNA-Seq and Chip-seq. Stampy excels in the mapping of reads containing that contain sequence variation relative to the reference, in particular for those containing insertions or deletions. It can map reads from a highly divergent species to a reference genome for instance. Stampy achieves high sensitivity and speed by using a fashashing algorithm and a detailed statistical model.

High-volume sequencing of DNA and RNA is now within reach of any research laboratory and is quickly becoming established as a key research tool. In many workflows, each of the short sequences (“reads”) resulting from a sequencing run are first “mapped” (aligned) to a reference sequence to infer the read from which the genomic location derived, a challenging task because of the high data volumes and often large genomes. Existing read mapping software excel in either speed (e.g., BWA, Bowtie, ELAND) or sensitivity (e.g., Novoalign), but not in both. In addition, performance often deteriorates in the presence of sequence variation, particularly so for short insertions and deletions (indels).

Here, we present a read mapper, Stampy, which uses a hybrid mapping algorithm and a detailed statistical model to achieve both speed and sensitivity, particularly when reads include sequence variation. This results in a higher useable sequence yield and improved accuracy compared to that of existing software.

Stamps is available at:

Lunter G, Goodson M. (2010) Stampy: A statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res [Epub ahead of print]. [abstract]

Scroll To Top