Gene fusions are prevalent in a wide array of cancer types with different frequencies. Long-read transcriptome sequencing technologies, such as PacBio, Iso-Seq, and Nanopore direct RNA sequencing, provide full-length transcript sequencing reads, which could facilitate detection of gene fusions. Researchers at the University of Alabama, Birmingham have developed a method, FusionSeeker, to comprehensively characterize gene fusions in long-read cancer transcriptome data and reconstruct accurate fused transcripts from raw reads. FusionSeeker identified gene fusions in both exonic and intronic regions, allowing comprehensive characterization of gene fusions in cancer transcriptomes. Fused transcript sequences were reconstructed with FusionSeeker by correcting sequencing errors in the raw reads through partial order alignment algorithm. Using these accurate transcript sequences, FusionSeeker refined gene fusion breakpoint positions and predicted breakpoints at single basepair resolution. Overall, FusionSeeker will enable users to discover gene fusions accurately using long-read data, which can facilitate downstream functional analysis as well as improved cancer diagnosis and treatment.
Workflow of FusionSeeker
FusionSeeker scans the input file of the read alignments for split read alignments and records candidate fusions of gene fusions when two segments from one read are aligned to two distinct genes. It then clusters the candidate fusions into gene fusion calls and removes noise calls supported by only a few reads. For each fusion call, FusionSeeker generates a consensus transcript sequence by performing a partial order alignment with fusion-containing reads. The final output of FusionSeeker includes a list of confident gene fusion events and corresponding transcript sequences.