High-throughput RNA-Sequencing followed by computational analysis has vastly accelerated the identification of viral and other pathogenic sequences in clinical samples, but cross-contamination during the processing of the samples remain a major problem that can lead to erroneous conclusions.
Researchers at the National Heart, Lung, and Blood Institute were surprised to find HPV38 sequences specifically present in RNA-Seq samples of endometrial cancer patients from TCGA, a virus not previously associated with this type of cancer. However, multiple lines of evidence suggested possible cross-contamination in these samples, which were processed together in the same batch. Despite this potential cross-contamination, their data indicate that they have detected a new isolate of HPV38 that appears to be integrated into the human genome. The researchers also herein provide a general guideline for computational detection and interpretation of pathogen-disease associations.