Whole-exome (WES) and RNA-sequencing (RNA-seq) are key components of cancer immunogenomic analyses. To evaluate the consistency of tumor WES and RNA-seq profiling platforms across different centers, the Cancer Immune Monitoring and Analysis Centers (CIMACs) and the Cancer Immunologic Data Commons (CIDC) conducted a systematic harmonization study.
DNA and RNA were centrally extracted from fresh frozen (FF) and formalin-fixed paraffin-embedded (FFPE) non-small cell lung carcinoma (NSCLC) tumors and distributed to three centers for WES and RNA-seq profiling. In addition, two 10-plex HapMap cell-line pools with known mutations were used to evaluate the accuracy of the WES platforms.
The WES platforms achieved high precision (> 0.98) and recall (> 0.87) on the HapMap pools when evaluated on loci using > 50X common coverage. Non-synonymous mutations clustered by tumor sample, achieving an Index of Specific Agreement above 0.67 among replicates, centers, and sample processing. A DV200 > 24% for RNA, as a putative pre-sequencing RNA quality control (QC) metric, was found to be a reliable threshold for generating consistent expression readouts in RNA-seq and NanoString data. MedTIN > 30 was likewise assessed as a reliable RNA-seq QC metric, above which samples from the same tumor across replicates, centers, and sample processing runs could be robustly clustered and HLA typing, immune infiltration, and immune repertoire inference could be performed.
CIDC common pipelines for RNA-seq data processing
BAM files aligned to hg19were converted to FASTQ files using Bam2fastq. For the raw reads, SALMON was used for transcript quantification using gencode.v.22 as reference genome. STAR was used for alignment using hg38 (GRCh38.d1.vd1) as reference genome. RSeQC was used to check the RNA-seq quality and to generate medTIN score, where medTIN score measures the RNA integrity at transcript level. Immune repertoires were estimated by TRUST4, which was used to infer CDR3 clonal types for TCRA, TCRB, TCRD, IGH, IGK, and IGL tumor immune repertoires. HLA typing was inferred using Optitype. Expression-based immune cell infiltration estimates were calculated by TIMER, xCell, MCPCounter, CIBERSORT, EPIC, and quanTIseq.
The CIMAC collaborating laboratory platforms effectively generated consistent WES and RNA-seq data and enable robust cross-trial comparisons and meta-analyses of highly complex immuno-oncology biomarker data across the NCI CIMAC-CIDC Network.