STAR: ultrafast universal RNA-seq aligner

To align their large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, a team of researchers at Cold Spring Harbor Laboratory developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure.

  • Very high mapping speed:
    on a modest 12-core cluster STAR maps 400 Million pairs per hour for human 2×100 Illumina reads (>50 times faster than TopHat).
  • Accurate alignment of contiguous and spliced reads:
    in our tests on real and simulated data STAR showed better sensitivity and precision than TopHat.
  • Detection of polyA-tails, non-canonical splices and chimeric (fusion) junctions.
  • Mapping reads of any length:
    STAR can efficiently map reads of any length generated by current or emerging sequencing platforms, starting from ~15 bases (small RNA) and up to full length transcripts several kilobases long.
  • Thorough testing on large ENCODE datasets:
    STAR was used to map 64 Billion reads of long RNA-seq and 16 Billion reads of short RNA-seq, and will be used to map RNA-seq data in the next ENCODE phase.

STAR requires ~30GB of RAM for mapping to the human genome (could be reduced to 16GB in the “sparse” mode with some speed loss).

STARAvailability and implementation: STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.

Contact: dobin@cshl.edu

I will be happy to answer any questions via SEQanswers, STAR discussion forum

  • Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 29(1), 15-21. [article]
Scroll To Top