Spliced Transcripts Alignment to a Reference (STAR)
Accurate alignment of high-throughput RNA-Seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-Seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases.
Now, researchers at Cold Spring Harbor Laboratory, NY have developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously un-described RNA-Seq alignment algorithm which utilizes sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure.
They have used STAR to align their large (exceeding 80 billon reads) ENCODE Transcriptome RNA-Seq dataset.
STAR outperforms other aligners by more than a factor of 50 in mapping speed, aligning to the human genome 550 Million 2x76bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full length RNA sequences.
Implementation and Availability: STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/
- Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. (2012) STAR: ultrafast universal RNA-seq aligner. Bioinformatics [Epub ahead of print]. [abstract]