BBMap (aligner for DNA/RNA-Seq) is now open-source and available for download

BBMap is now available here:

BBMap/BBtools are now open source. Please try it out – it’s a 3MB download, and written in pure Java, so installation is trivial – just unzip and run. Handles all platforms (Illumina, PacBio, 454, etc) except Solid colorspace, which I removed to simplify the code.

A Powerpoint comparison of performance (speed, memory, sensitivity, specificity) on various genomes, compared to bwa, bowtie2, gsnap, smalt:…it?usp=sharing

…but in summary, BBMap is similar in speed to bwa (usually faster), with much better sensitivity and specificity than any other aligner I’ve compared it to. It uses more memory than Burrows-Wheeler-based aligners, but in exchange, the indexing speed is many times faster.

How to use

There is documentation in the docs folder and displayed by shellscripts when run with no arguments. But for example: ref=ecoli.fa
…will build an index and write it to the present directory in=reads.fq out=mapped.sam
…will map to the indexed reference in1=reads1.fq in2=reads2.fq out=mapped.sam ref=ecoli.fa nodisk
…will build an index in memory and map paired reads to it in a single command

If your OS does not support shellscripts, replace ‘’ like this:
java -Xmx30g -cp /path/to/current align2.BBMap in=reads.fq out=mapped.sam

…where /path/to/current is the location of the ‘current’ directory, and -Xmx30g specifies the amount of memory to use. This should be set to about 85% of physical memory (the symbols ‘m’ or ‘g’ specify megs or gigs). Human reference requires around 23 GB; generally, references need around 8 bytes per base pair, and a minimum of 1 GB at default settings. The shellscripts are just wrappers that display usage information and set the -Xmx parameter.

Please ask if you encounter any problems or need help! And there are other neat tools too, for error correction, normalization, depth-binning, reference-based binning, contaminant filtering, adapter trimming, optimal quality trimming, reformatting files, paired-read merging, deduplication of assemblies, and histogram generation for things like kmer depth and insert size.