Calling copy number alterations (CNAs) from RNA sequencing (RNA-Seq) is challenging, because of the marked variability in coverage across genes and paucity of single nucleotide polymorphisms (SNPs). We have adapted SuperFreq to call absolute and allele sensitive CNAs from RNA-Seq. SuperFreq uses an error-propagation framework to combine and maximise information from read counts and B-allele frequencies (BAFs).
Researchers from the Walter and Eliza Hall Institute of Medical Research used datasets from The Cancer Genome Atlas (TCGA) to assess the validity of CNA calls from RNA-Seq. When ploidy estimates were consistent, the researchers found agreement with DNA SNP-arrays for over 98% of the genome for acute myeloid leukaemia (TCGA-AML, n = 116) and 87% for colorectal cancer (TCGA-CRC, n = 377). The sensitivity of CNA calling from RNA-Seq was dependent on gene density. Using RNA-Seq, SuperFreq detected 78% of CNA calls covering 100 or more genes with a precision of 94%. Recall dropped for focal events, but this also depended on signal intensity. For example, in the CRC cohort SuperFreq identified all cases (7/7) with high-level amplification of ERBB2, where the copy number was typically >20, but identified only 6% of cases (1/17) with moderate amplification of IGF2, which occurs over a smaller interval. SuperFreq offers an integrated platform for identification of CNAs and point mutations. As evidence of how SuperFreq can be applied, the researchers used it to reproduce the established relationship between somatic mutation load and CNA profile in CRC using RNA-Seq alone.
Availability: SuperFreq is implemented in R and the code is available through GitHub: https://github.com/ChristofferFlensburg/SuperFreq.