St. Jude researchers integrate WGS and RNA-Seq to create an analytic tool that opens a new frontier of cancer discovery

St. Jude Children’s Research Hospital scientists have developed software to identify cancer-causing mutations lurking in vast regions of the human genome

Gene coding regions constitute 2% of the human genome. St. Jude Children’s Research Hospital scientists have developed a computational tool to identify alterations that drive tumor formation in the remaining 98% of the genome. The method will aid discovery of oncogenes and advances in precision medicine for children and adults with cancer.

The approach, detailed today in the journal Nature Genetics, is called cis-expression or cis-X. Researchers developed the innovative analytic method to identify novel pathogenic variants and oncogenes activated by such variants in regulatory noncoding DNA of patient tumors. Cis-X works by identifying abnormal expression of tumor RNA. Investigators analyzed leukemia and solid tumors and demonstrated the power of the approach.

Noncoding DNA, which does not encode genes, makes up 98% of the human genome. However, growing evidence suggests that more than 80% of the noncoding genome is functional and may regulate gene expression. Population studies have identified variants in noncoding DNA that are associated with an elevated cancer risk. But only a small number of noncoding variants in tumor genomes that contribute to tumor initiation have been discovered. Finding these variants required whole genome sequencing analysis of a large number of tumor samples.

“Cis-X is a fundamental change from existing approaches that require thousands of tumor samples and only identify noncoding variants that happen recurrently,” said Jinghui Zhang, Ph.D.St. Jude Department of Computational Biology chair. She and Yu Liu, Ph.D., previously of St. Jude and now of Shanghai Children’s Medical Center, are the corresponding authors. Liu is also a first author.

“By using aberrant gene transcription to reveal the function of noncoding variants, we developed cis-X to enable discovery of noncoding variants driving cancer in individual tumor genomes,” Zhang said. “Identifying variants that lead to dysregulation of oncogenes can expand the scope of the precision medicine to noncoding regions for identifying therapeutic options to suppress aberrantly activated oncogenes in tumors.”


Cis-X was inspired by a 2014 Science paper from Thomas Look, M.D., of Dana-Farber Cancer Institute and his colleagues. Look is a co-author of the current paper. Working in cell lines, Look’s team identified noncoding DNA variants responsible for abnormal activation of an oncogene (TAL1) that led to T-cell acute lymphoblastic leukemia (T-ALL). The research prompted Zhang to pursue her long-standing interest in examining variations in expression of each copy of a gene.

Cis-X works by searching for genes with altered expression in two ways. Researchers used whole genome and RNA sequencing to find genes that are expressed on just one chromosome and expressed at aberrantly high levels.

“It can be noisy when analyzing the imbalance of gene expression between alleles,” Liu said. “This analysis used a novel mathematical model that makes cis-X a robust tool for discovery.”

Cis-X then searches for the cause of the abnormal expression by looking for alterations in regulatory regions of noncoding DNA within a 3D genome architecture. “The approach mimics the way the variants work in living cells,” Liu said. The alterations include changes such as chromosomal rearrangements and point mutations.

“Few functional noncoding variations happen at high recurrence, but they are important drivers of tumor initiation and progression,” Zhang said. “Without identifying the noncoding variant, we may not have the full picture of what caused the cancer.”

cis-X workflow

Fig. 1

cis-X was designed to perform integrated analysis of WGS and RNA-seq data generated from an individual tumor genome. It integrates ASE and outlier high expression as key signatures of cis-activated genes to seed the discovery of regulatory noncoding variants in the context of 3D architecture of the genome. Functional genomics data such as ChIP–seq generated from samples with matching tissue of origin and variant context can be provided by the user to enhance candidate variant annotation.


Researchers used cis-X to analyze the cancer genomes of 13 T-ALL patients with the data generated as a collaboration between St. Jude and Shanghai Children’s Medical Center. The algorithm identified known and novel oncogene-activating noncoding variants as well as a possible new T-ALL oncogene, PRLR.

Investigators also showed the method worked in adult and pediatric solid tumors, including neuroblastoma, a childhood cancer of immature nerve cells. Solid tumors posed a greater analytic challenge. Unlike leukemia, solid tumors often have an abnormal number of chromosomes that are not uniformly distributed in the tumor.

“Cis-X offers a powerful new approach for investigating the functional role of noncoding variants in cancer, which may expand the scope of precision medicine to treat cancer caused by such variants,” Zhang said.

Cis-X software is publicly available at no cost to researchers through GitHub software repository, St. Jude Cloud and Zhang’s laboratory page.

Source – St. Jude Children’s Research Hospital

Liu Y, Li C, Shen S. et al. (2020) Discovery of regulatory noncoding variants in individual cancer genomes by using cis-X. Nat Genet [onine ahead of print]. [abstract]

Leave a Reply

Your email address will not be published. Required fields are marked *


Time limit is exhausted. Please reload CAPTCHA.