To align their large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, a team of researchers at Cold Spring Harbor Laboratory developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure.
- Very high mapping speed:
on a modest 12-core cluster STAR maps 400 Million pairs per hour for human 2×100 Illumina reads (>50 times faster than TopHat).
- Accurate alignment of contiguous and spliced reads:
in our tests on real and simulated data STAR showed better sensitivity and precision than TopHat.
- Detection of polyA-tails, non-canonical splices and chimeric (fusion) junctions.
- Mapping reads of any length:
STAR can efficiently map reads of any length generated by current or emerging sequencing platforms, starting from ~15 bases (small RNA) and up to full length transcripts several kilobases long.
- Thorough testing on large ENCODE datasets:
STAR was used to map 64 Billion reads of long RNA-seq and 16 Billion reads of short RNA-seq, and will be used to map RNA-seq data in the next ENCODE phase.
STAR requires ~30GB of RAM for mapping to the human genome (could be reduced to 16GB in the “sparse” mode with some speed loss).
Availability and implementation: STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.
I will be happy to answer any questions via SEQanswers, STAR discussion forum
- Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 29(1), 15-21. [article]