RNA-seq is becoming a preferred tool for genomics studies of model and non-model organisms. However, DNA-based analysis of organisms lacking sequenced genomes cannot rely on RNA-seq data alone to isolate most genes of interest, as DNA codes both exons and introns.
With this in mind, researchers at Ben Gurion University of the Negev designed a novel tool, LEMONS, that exploits the evolutionary conservation of both exon/intron boundary positions and splice junction recognition signals to produce high throughput splice-junction predictions in the absence of a reference genome. When tested on multiple annotated vertebrate mRNA data, LEMONS accurately identified 87% (average) of the splice-junctions. LEMONS was then applied to our updated Mediterranean chameleon transcriptome, which lacks a reference genome, and predicted a total of 90,820 exon-exon junctions. The researchers experimentally verified these splice-junction predictions by amplifying and sequencing twenty randomly selected genes from chameleon DNA templates. Exons and introns were detected in 19 of 20 of the positions predicted by LEMONS. To the best of their knowledge, LEMONS is currently the only experimentally verified tool that can accurately predict splice-junctions in organisms that lack a reference genome.
Flow chart of the steps performed by LEMONS. (A) LEMONS default and primary database taken from UCSC Genome Browser and HG19 encompasses all non-redundant human RefSeq proteins, together with their known splice-junctions location. Arrowhead-like gaps correspond to splice-junctions. (B) LEMONS employs BLASTX pairwise alignment to compare each of the identified transcripts to their orthologous proteins (as compared to the reference database) and predicts splice-junctions based on the conserved gene structure. (C) LEMONS uses all predicted exons that do not split codons to establish the 3′ motif of the exon. (D) The identified motif assists in choosing between adjacent potential splice-junctions and between the two potential splice-junctions that split codons. (E) Using more than one reference database enhances the accuracy of splice-junction prediction (again, while implementing a motif search).
Availability – The executable files, source code, graphical user interface (GUI) and a Linux version of the program, as well as a user manual, are available at: http://dx.doi.org/10.6084/m9.figshare.1599765.