Variant allele frequencies (VAF) are an important measure of genetic variation that can be estimated at single-nucleotide variant (SNV) sites. RNA and DNA VAFs are used as indicators of a wide-range of biological traits, including tumor purity and ploidy changes, allele-specific expression and gene-dosage transcriptional response. Researchers at The George Washington University have developed a novel methodology to assess gene and chromosomal allele asymmetries and to aid in identifying genomic alterations in RNA and DNA datasets. Their approach is based on analysis of the VAF distributions in chromosomal segments (continuous multi-SNV genomic regions). In each segment they estimate variant probability, a parameter of a random process that can generate synthetic VAF samples that closely resemble the observed data. The researchers show that variant probability is a biologically interpretable quantitative descriptor of the VAF distribution in chromosomal segments which is consistent with other approaches. To this end, they apply the proposed methodology on data from 72 samples obtained from patients with breast invasive carcinoma (BRCA) from The Cancer Genome Atlas (TCGA). They compare DNA and RNA VAF distributions from matched RNA and whole exome sequencing (WES) datasets and find that both genomic signals give very similar segmentation and estimated variant probability profiles. The researchers also found a correlation between variant probability with copy number alterations (CNA). Finally, to demonstrate a practical application of variant probabilities, they use them to estimate tumor purity. Tumor purity estimates based on variant probabilities demonstrate good concordance with other approaches (Pearson’s correlation between 0.44 and 0.76). Their evaluation suggests that variant probabilities can serve as a dependable descriptor of VAF distribution, further enabling the statistical comparison of matched DNA and RNA datasets. Finally, they provide conceptual and mechanistic insights into relations between structure of VAF distributions and genetic events. The methodology is implemented in a Matlab toolbox that provides a suite of functions for analysis, statistical assessment and visualization of Genome and Transcriptome allele frequencies distributions.
GeTallele and visualization of VAF data
(A) Toolbox description. (B) Visualization of the whole dataset on the level of genome using Circos plot (blue, normal exome; cyan, normal transcriptome; orange, tumor exome; yellow, tumor transcriptome). (C) CNA values for chromosome 1. (D–F) Visualization of the VAF values with fitted variant probability. VAFTEX and VAFTTR values at the level of: chromosome (chromosome 1) (D), custom genome region (E), and gene (F). (D) Shows that there are two chromosomal segments with different VAF distributions, likely representing a region of copy-neutral loss of heterozygosity. (C) Shows that large scale change in the CNA is concurrent with the change in the VAF distributions. In panel titles: Tex, vPR estimate for VAF distributions of tumor exome (orange); Ttr, vPR estimate for VAF distributions of tumor transcriptome (yellow).
Availability – GeTallele is available at: https://github.com/SlowinskiPiotr/GeTallele.