Genomic rearrangements can modify gene function by altering transcript sequences, and have been shown to be drivers in both cancer and rare diseases. Although there are now many methods to detect structural variants from Whole Genome Sequencing (WGS), RNA sequencing (RNA-seq) remains under-utilised as a technology for the detection of gene altering structural variants. Calling fusion genes from RNA-seq data is well established, but other transcriptional variants such as fusions with novel sequence, tandem duplications, large insertions and deletions, and novel splicing are difficult to detect using existing approaches.
To identify all types of variants in transcriptomes, researchers from the Peter MacCallum Cancer Centre developed MINTIE, an integrated pipeline for RNA-seq data. They take a reference free approach, which combines de novo assembly of transcripts with differential expression analysis, to identify up-regulated novel variants in a case sample.
The researchers validated MINTIE on simulated and real data sets and compared it with eight other approaches for finding novel transcriptional variants. They found MINTIE was able to detect all defined variant classes at high rates (>70%) while no other method was able to achieve this.
The researchers applied MINTIE to RNA-seq data from a cohort of acute lymphoblastic leukemia (ALL) patient samples and identified several novel clinically relevant variants, including an unpartnered recurrent fusion involving the tumour suppressor gene RB1, and variants in ALL-associated genes: tandem duplications in IKZF1 and PAX5, and novel splicing in ETV6. They further demonstrate the utility of MINTIE to identify rare disease variants using RNA-seq, including the discovery of an inter-chromosomal translocation in the DMD gene in a patient with muscular dystrophy. They posit that MINTIE will be able to identify new disease variants across a range of cancers and other disease types.
Examples of variant types detected by MINTIE
(NTS = non-templated sequence.) a. Canonical fusions. Typically defined as a transcriptional product of two genes, joined at exon-exon boundaries. b. Non-canonical fusions. Fusions that join exon-boundaries to non-gene regions, as well as fusions without a second gene partner. c. Transcribed structural variants (TSVs). May include internal tandem duplications (ITDs), partial tandem duplications (PTDs), inversions, deletions and insertions. d. Novel splice variants (NSVs). Includes extended exons, novel exons, retained introns, truncated exons and skipped exons.