Genomic profiling using short-read sequencing has utility in detecting disease-associated variation in both DNA and RNA. However, given the frequent occurrence of structural variation in cancer, molecular profiling using long-read sequencing improves the resolution of such events. For example, the Pacific Biosciences (PacBio) Iso-Seq transcriptome protocol provides full-length isoform characterization, discernment of allelic phasing, isoform discovery, and identifies expressed fusion partners. Researchers at Nationwide Children’s Hospital, have developed PacBio Fusion and Long Isoform Pipeline (PB_FLIP) which incorporates a suite of RNA-Seq software analysis tools and scripts to identify expressed fusion partners and isoforms. In addition, the researchers sequenced a commercial reference (Spike-In RNA Variants; SIRV) with known isoform complexity and demonstrated high recall of the Iso-Seq and PB_FLIP workflow to benchmark their protocol and analysis performance. Herein, they describe the utility of Iso-Seq and PB_FLIP analysis in improving deconvolution of complex structural variants and isoform detection within an institutional pediatric and adolescent/young adult (AYA) translational cancer research cohort. Using exemplar case studies, the researchers demonstrate that Iso-Seq and PB_FLIP discover novel expressed fusion partners, resolve complex intragenic alterations, and discriminate between allele-specific expression profiles.
PacBio Fusion and Long Isoform Pipeline (PB_FLIP) Overview
Patients who consented to IRB protocols have samples taken for pending nucleic acid extraction. RNA from disease-involved tissues is processed using the Iso-Seq procedure and sequenced on PacBio instrumentation generating long-read RNA-Seq reads. Reads generated by Iso-Seq are initially processed using PacBio Iso-Seq Analysis software to remove primer sequences and artificial concatemers, trim 3’ poly(A) tails and collapse similar reads into high-quality isoforms (HQ_Isoforms). Mapped HQ_Isoforms processing includes the Fusion Pipeline (tan boxes) and Isoform Pipeline (purple boxes) of PB_FLIP. The Fusion Pipeline 1. detects gene fusion events, 2. classifies fusion isoforms using SQANTI3 structural categories, 3. filters fusion isoforms to ensure unique gene partners, and 4. sorts fusion gene partners detected by PB_FLIP using reported prior findings. The Isoform Pipeline 1. collapses HQ_Isoforms, 2. classifies isoforms using SQANTI3 splice-junction categories, 2.1 quantifies isoform expression (using RNA-Seq data), 3. evaluates isoforms for structural variation, 4. merges classification, expression, and SVs into a final transcript of interest (TOI) list, and 5. provides DisGenNet database disease association for TOI genes.