Advances in next-generation DNA sequencing technologies are now enabling detailed characterization of sequence variations in cancer genomes. With whole genome sequencing, variations in coding and non-coding sequences can be discovered. But the cost associated with it is currently limiting its general use in research. Whole exome sequencing is used to characterize sequence variations in coding regions, but the cost associated with capture reagents and biases in capture rate limit its full use in research. Additional limitations include uncertainty in assigning the functional significance of the mutations when these mutations are observed in the non-coding region or in genes that are not expressed in cancer tissue.
University of Kansas Medical Center researchers investigated the feasibility of uncovering mutations from expressed genes using RNA sequencing datasets with a method called “VaDiR: Variant Detection in RNA” that integrates three variant callers, namely: SNPiR, RVBoost and MuTect2. The combination of all three methods, which they called Tier1 variants, produced the highest precision with true positive mutations from RNA-seq that could be validated at the DNA level. They also found that the integration of Tier1 variants with those called by MuTect2 and SNPiR produced the highest recall with acceptable precision. Finally, the researchers observed higher rate of mutation discovery in genes that are expressed at higher levels.
VaDiR workflow for processing somatic variant calls from RNA-seq
Sequence alignment is done by STAR and BWA MEM for RNA and DNA respectively. The refined mapping follows GATK Best Practices. The variant calling is done by Unified Genotyper (GATK) and MuTect2 (GATK). The following filtering steps are done by RVBoost and SNPiR. Additional filters such as MAQ>40, germline read depth (DP)>5 and germline variant fraction (VAF)<0.03 are applied to remove germline variants.
This method, VaDiR, provides a possibility of uncovering mutations from RNA sequencing datasets that could be useful in further functional analysis. In addition, this approach allows orthogonal validation of DNA-based mutation discovery by providing complementary sequence variation analysis from paired RNA/DNA sequencing data sets.
Availability – Project home page: http://dx.doi.org/10.5524/100360