JAFFA – High sensitivity transcriptome-focused fusion gene detection

Genomic instability is a hallmark of cancer and, as such, structural alterations and fusion genes are common events in the cancer landscape. RNA sequencing (RNA-Seq) is a powerful method for profiling cancers, but current methods for identifying fusion genes are optimised for short reads. JAFFA is a sensitive fusion detection method that outperforms other methods with reads of 100 bp or greater. JAFFA compares a cancer transcriptome to the reference transcriptome, rather than the genome, where the cancer transcriptome is inferred using long reads directly or by de novo assembling short reads.

JAFFA is based on the idea of comparing a sequenced transcriptome against a reference transcriptome. As a default, JAFFA uses transcripts from GENCODE as a reference. For all JAFFA modes, reads aligning to intronic or intergenic regions are first removed to improve computational performance. Sequences are then converted into a common form – tumour sequences – consisting of either assembled contigs or the reads themselves. These sequences are processed by a core set of fusion-finding steps. First, sequences are aligned to a reference transcriptome and those that align to multiple genes are selected. Second, read support is determined. Third, putative candidates are aligned to the genome to check the genomic position of breakpoints. Finally, JAFFA calculates characteristics of each fusion and uses this to prioritise candidates for validation.

rna-seq

The JAFFA pipeline. An example of the JAFFA pipeline is demonstrated in detail using the RPS6KB1-VMP1 fusion from the MCF-7 breast cancer cell line dataset. Step 1: RNA-Seq reads are first filtered to remove intronic and intergenic reads. 50 bp reads would then be assembled into contigs using Oases. For longer reads, this step is not necessary. Step 2: The resulting tumour sequences are then aligned to the reference transcriptome and those that align to multiple genes are selected. These contigs make up a set of initial candidate fusions. Step 3: Next, the pipeline counts the number of reads and read pairs that span the breakpoint. Step 4: Candidates are then aligned to the human genome. Genomic coordinates of the breakpoint are determined. Step 5: Further selection and candidate classification is carried out using quantities such as genomic gap size, supporting reads and alignment of breakpoints to exon-exon boundaries. Step 6: A final list of candidates is reported along with their sequence.

Availabilityhttps://github.com/Oshlack/JAFFA/wiki

Davidson NM, Majewski IJ, Oshlack A. (2015) JAFFA: High sensitivity transcriptome-focused fusion gene detection. Genome Med 7(1):43. [article]

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Time limit is exhausted. Please reload CAPTCHA.