SplicingCodes.com’s scientists, in collaboration with colleagues from USA and China, have identified the KANSARL fusion gene as the first familially-inherited cancer susceptible fusion gene specific to the population of European ancestry origin. The KANSARL fusion gene is also the cancer gene being discovered so far affecting the largest numbers of people and families. KANSARL is a fusion gene between KANSL1–ARL17A on the negative strand of 17q21.31 and is likely to encode the truncated KANSL1 protein, a regulatory subunit of MLL1 and NSL1 complexes involved with chromatin histone H4 and p53 acetylation. The research paper entitled “Identification of KANSARL as the First Cancer Predisposition Fusion Gene Specific to the Population of European Ancestry Origin” is published online Oncotarget on March 24 2017.
In this publication, the SplicingCodes model was used to develop the SCIF system (SplicingCodes Indentify Fusion Transcripts) to identify six KANSARL fusion isoforms in cancer cell lines. Then, the scientists systematically analyzed the RNA-seq data of many cancer types including glioblastoma, prostate cancer, lung cancer, breast cancer, and lymphoma from different geological regions of the World. The KANSARL fusion transcripts were rarely detected from the tumor samples of the patients from Asia or Africa, but they were present in 30 – 52% of the tumors from North American cancer patients. Analysis of CEPH/Utah Pedigree 1463 revealed that KANSARL is a familially-inherited fusion gene. Further analysis of RNA-seq datasets of the 1000 Genome Project found out that the KANSARL fusion gene occurred specifically to 28.9% of the population of European ancestry origin.
This can potentially be one of the most significant discoveries in cancer research in terms of using the “Big Data” approach following with experimental validations. To date, no other groups have been able to identify a familially-inherited fusion gene based on RNA-seq data. The scientists at SplicingCodes.com have spent many years to develop a SplicingCode theory and subsequently been able to transform the theory into technologies to predict novel alternative-spliced isoforms, to characterize the differential gene expression patterns, and to identify novel alternatively-spliced isoforms and fusion transcripts. The SCIF system can be used to identify fusion transcripts quickly, accurately and reproducibly with a high sensitivity to distinguish between the lowly-expressed fusion transcripts and the “spurious” ones. To date, Splicingcodes.com has identified over 1.1 million novel fusion transcripts, like the fusion transcripts of the KANSARL fusion genes, many of which are likely biomarkers of early alternations of cytogenetics, biochemistry and physiology. Those newly-discovered cancer biomarkers identified by the SCIF system will not only be powerful tools in basic and clinic cancer research, but also can potentially be utilized for early cancer detection and novel therapy development to save patients’ lives.
Identification and characterization of KANSARL (KANSL1 – ARL17A) fusion transcripts a)
A schematic diagram showing steps of genetic rearrangements from normal genomic structures of ARL17A → KANSL1 genes to inverted genomic structures of KANSL1 → ARL17A genes on the chromosomal band 17q21.31. Dashed white horizontal arrow and solid white vertical arrow represent genomic rearrangements and potential fusion gene structures. Solid red and black horizontal arrows indicate ARL17A and KANSL1 genes, respectively. Solid blue arrows represent LRRC37A and MAPT genes, respectively. The dashed horizontal black arrow indicates undetermined genomic regions. Black and black squares represent KANSL1 and ARL17A exons respectively. b). The schematic diagram shows KANSARL fusion transcripts identified so far. Black and red squares represent KANSL1 and ARL17A exons respectively. Dashed lines indicate omitted regions. The numbers above the black and red squares are exon numbers. The numbers within sequences indicate omitted numbers of nucleotides; c). Validation of KANSARL isoform 1 in A549, HeLa, K562, 786-O and OS-RC-2 cell lines; d). Validation of KANSARL isoform 2 in A549, HeLa, K562, 786-O and OS-RC-2 cell lines; e). Detection of KANSL1 gene expression in A549, HeLa, K562, 786-O and OS-RC-2 cell lines; f). Detection of ARL17A gene expression in A549, HeLa, K562, 786-O and OS-RC-2 cell lines; g). Detection of GAPDH gene expression as positive controls in A549, HeLa, K562, 786-O and OS-RC-2; h) Sanger sequencing validation of KANSARL isoform 2. The black and red letters represent KANSL1 exon 3 and ARL17A exon 3 sequences, respectively. And i) Sanger sequencing validation of KANSARL genomic breakpoint in the Hela-3 cell line. The black and red letters indicate KANSL1 and ARL17A intronic sequences, respectively. Vertical arrows indicate the fusion junctions. The black and red lines indicate KANSL1 and ARL17A sequences, respectively. All markers are 100 bp DNA markers.
The identification of over 1.1 million fusion transcripts using the SCIF system and its validity indicate that the system is much more accurate, efficient and reliable that the existing fusion gene detection software systems available so far. For additional information and the SCIF system, please visit the website: http://splicingcodes.com.