Researchers from the Royal Institute of Technology, Stockholm have developed a novel analysis method that can interrogate the authenticity of biological samples used for generation of transcriptome profiles in public data repositories. The method uses RNA sequencing information to reveal mutations in expressed transcripts and subsequently confirms the identity of analysed cells by comparison with publicly available cell-specific mutational profiles. Cell lines constitute key model systems widely used within cancer research, but their identity needs to be confirmed in order to minimise the influence of cell contaminations and genetic drift on the analysis.
Using both public and novel data, the researchers demonstrate the use of RNA-sequencing data analysis for cell line authentication by examining the validity of COLO205, DLD1, HCT15, HCT116, HKE3, HT29 and RKO colorectal cancer cell lines. They successfully authenticate the studied cell lines and validate previous reports indicating that DLD1 and HCT15 are synonymous. They also show that the analysed HKE3 cells harbour an unexpected KRAS-G13D mutation and confirm that this cell line is a genuine KRAS dosage mutant, rather than a true isogenic derivative of HCT116 expressing only the wild type KRAS. This authentication method could be used to revisit the numerous cell line based RNA sequencing experiments available in public data repositories, analyse new experiments where whole genome sequencing is not available, as well as facilitate comparisons of data from different experiments, platforms and laboratories.
RNA-seq cell line authentication pipeline
Raw RNA-seq data is aligned to the human genome using STAR, followed by processing and variant calling steps using GATK tools. Non-SNV variants (insertions, deletions, etc.) are removed, and the resulting SNVs are annotated using SnpEff and SnpSift. SNVs passing the GATK variant filtration step and having a total allelic depth of at least 10 are compared to COSMIC SNV profiles filtered to include unique SNV positions, as well as a SNV genotyping panel.