About 15% of human cancer cases are attributed to viral infections. To date, virus expression in tumor tissues has been mostly studied by aligning tumor RNA sequencing reads to databases of known viruses. To allow identification of divergent viruses and rapid characterization of the tumor virome, researchers at Carnegie Mellon University developed viRNAtrap, an alignment-free pipeline to identify viral reads and assemble viral contigs. The researchers apply viRNAtrap, which is based on a deep learning model trained to discriminate viral RNAseq reads, to 14 cancer types from The Cancer Genome Atlas (TCGA). They found that expression of exogenous cancer viruses is associated with better overall survival. In contrast, expression of human endogenous viruses is associated with worse overall survival. Using viRNAtrap, the researchers uncovered expression of unexpected and divergent viruses that have not previously been implicated in cancer. The viRNAtrap pipeline provides a way forward to study viral infections associated with different clinical conditions.
Training and evaluation of the viRNAtrap framework
(a) A schematic overview of the viRNAtrap framework. Unmapped reads were extracted and given as input to the neural network, to extract the viral reads and assemble viral contigs, that were compared against three viral databases using blastn. (b) Receiver-operating characteristic and precision-recall curves showing the model performance when viRNAtrap was applied to the test set. (c) Bar plots showing different metrics to evaluate the model performance for the test set. (d) A phylogenetic tree showing the model scores for sequences from different human viruses with the respective virus classification (using average assigned score for each virus).
Availability – The viRNAtrap package is available through GitHub: https://github.com/AuslanderLab/virnatrap