Genome and transcriptome sequencing experience a challenging renewal with the advent of Next Generation Sequencing (NGS) technologies. Notably, short mRNA sequences produced by RNA-Seq enhance transcriptome analysis and promise great opportunities for the discovery of new genes and the identification of alternative transcripts. One way to analyze this data is aligning the reads against a reference genome. However, the sheer amount of NGS data requires highly efficient methods for accurate spliced alignments, which is further challenged by the size and quality of the sequence reads.
We propose a combination of the spliced alignment method QPALMA with the short read alignment tool GenomeMapper. The resulting method, called PALMapper, efficiently computes both spliced and unspliced alignments at high accuracy while taking advantage of base quality information and splice site predictions. QPALMA that relies on a machine learning strategy is highly sensitive but suffers from its time consumption in the alignment step, which can be impractical for large genomes or extremely large introns. To speed this up and thus to improve efficiency, we combined it with GenomeMapper that quickly carries out an initial read mapping which will then guide a banded Semi-Global and spliced alignment algorithm that allows for long gaps that correspond to introns. PALMapper considerably reduced time consumption without decreasing accuracy compared to QPALMA. In fact, it runs around 50 times faster and hence allows to align around 7 million reads per hour on a single AMD CPU core (similar speed as TopHat). Our study for C. elegans furthermore shows that PALMapper predicts introns with very high sensitivity (72%) and specificity (82%) when using the annotation as ground truth. PALMapper is considerably more accurate than TopHat (47% and 81%, respectively).
PALMapper is open access and the code is available here.
PALMapper can also be used on a galaxy server.
Jean G, Kahles A, Sreedharan VT, De Bona F, Rätsch G. (2010) RNA-Seq read alignments with PALMapper. Curr Protoc Bioinformatics 11(11.6). [abstract]