Gene fusions often occur in cancer cells and in some cases are the main driver of oncogenesis. Correct identification of oncogenic gene fusions thus has implications for targeted cancer therapy. Recognition of this potential has led to the development of a myriad of sequencing-based fusion detection tools. However, given the same input, many of these detectors will find different fusion points or claim different sets of supporting data. Furthermore, the rate at which these tools falsely detect fusion events in data varies greatly. This discrepancy between tools underscores the fact that computation algorithms still cannot perfectly evaluate evidence; especially when provided with small amounts of supporting data as is typical in fusion detection. We assert that when evidence is provided in an easily digestible form, humans are more proficient in identifying true positives from false positives.
Researchers from the Ohio State University have developed a web tool that, given the genomic coordinates of a candidate fusion breakpoint, will extract fusion and non-fusion reads adjacent to the fusion point from partner transcripts, and color code reads by transcript origin and read orientation for ease of intuitive inspection by the user. Fusion partner transcript read alignments are performed using a novel variant of the Smith-Waterman algorithm.
Combined with dynamic filtering parameters, the visualization provided by this tool introduces a powerful new investigative step that allows researchers to comprehensively evaluate fusion evidence. Additionally, this allows quick identification of false positives that may deceive most fusion detectors, thus eliminating unnecessary gene fusion validation. The developers apply their visualization tool to publicly available datasets and provide examples of true as well as false positives reported by open source fusion detection tools.
FuSpot Fusion Alignment Algorithm Visualized
A conceptual representation of the FuSpot alignment algorithm. Each colored line represents an overhead view of a 2D Smith Waterman score matrix representative of a given read and one of eight reference sequences (4 on the 5′ end of the breakpoint and 4 on the 3′ end.) The central black circle marks the fusion breakpoint. Since fusion analysis is typically conducted with RNA data, the ability to align against many references simultaneously is crucial. Here the chosen references exemplify how FuSpot can be used to align a set of reads, each of which may be fusion or non-fusion as well as genomic or exonic. During FuSpot realignment, should backtracking commence from a matrix on the 3′ side of the breakpoint, tracing could follow through to the breakpoint. In such a case, FuSpot’s realignment algorithm would search the rightmost column of each of the four matrices on the 5′ end of the breakpoint for the appropriate next step to trace. Once determined, it will follow the dotted line to that matrix, then trace through it for the remainder of the backtrack. Subsequently, the traced 5′ and 3′ matrices and their associated references will be assigned a color during FuSpot’s visualization step and the aligned read will take on the appropriate color for the reference to which it aligned on either side of the breakpoint