The composition and density of immune cells in the tumor microenvironment (TME) profoundly influence tumor progression and success of anti-cancer therapies. Flow cytometry, immunohistochemistry staining or single-cell sequencing are often unavailable such that we rely on computational methods to estimate the immune-cell composition from bulk RNA-sequencing (RNA-seq) data. Various methods have been proposed recently, yet their capabilities and limitations have not been evaluated systematically. A general guideline leading the research community through cell type deconvolution is missing.
A team led by researchers at the Technical University of Munich developed a systematic approach for benchmarking such computational methods and assessed the accuracy of tools at estimating nine different immune- and stromal cells from bulk RNA-seq samples. The researchers used a single-cell RNA-seq dataset of ∼11 000 cells from the TME to simulate bulk samples of known cell type proportions, and validated the results using independent, publicly available gold-standard estimates. This allowed them to analyze and condense the results of more than a hundred thousand predictions to provide an exhaustive evaluation across seven computational methods over nine cell types and ∼1800 samples from five simulated and real-world datasets. The researchers demonstrate that computational deconvolution performs at high accuracy for well-defined cell-type signatures and propose how fuzzy cell-type signatures can be improved. They suggest that future efforts should be dedicated to refining cell population definitions and finding reliable signatures.
Benchmark on simulated and true bulk RNA-seq samples reveals differences
in method performance between cell types
(a) Correlation of predicted versus known cell type fractions on 100 simulated bulk RNA-seq samples generated from single cell RNA-seq. Pearson’s r is indicated in each panel. Due to the lack of a corresponding signature, we estimated macrophages/monocytes with EPIC using the ‘macrophage’ signature and with MCP-counter using the ‘monocytic lineage’ signature as a surrogate. (b) Performance of the methods on three independent datasets that provide immune cell quantification by FACS. Different cell types are indicated in different colors. Pearson’s r has been computed as a single correlation on all cell types simultaneously. Note that only methods that allow both inter- and intra-sample comparisons (i.e. EPIC, quanTIseq, CIBERSORT absolute mode) can be expected to perform well here. (c–d) Performance on the three validation datasets per cell type. Schelker’s and Racle’s dataset have too few samples to be considered individually. The values indicate Pearson correlation of the predictions with the cell type fractions determined using FACS. Blank squares indicate that the method does not provide a signature for the respective cell type. ‘n/a’ values indicate that no correlation could be computed because all predictions were zero. The asterisk (*) indicates that the ‘monocytic lineage’ signature was used as a surrogate to predict monocyte content. P-values: **** < 0.0001; *** < 0.001; ** < 0.01; * < 0.05; ns . P-values are not adjusted for multiple testing.
Availability – Immunedeconv R package is available at: https://github.com/icbi-lab/immunedeconv.