Bulk RNA sequencing provides the opportunity to understand biology at the whole transcriptome level without the prohibitive cost of single cell profiling. Advances in spatial transcriptomics enable to dissect tissue organization and function by genome-wide gene expressions. However, the readout of both technologies is the overall gene expression across potentially many cell types without directly providing the information of cell type constitution. Although several in-silico approaches have been proposed to deconvolute RNA-Seq data composed of multiple cell types, many suffer a deterioration of performance in complex tissues.
Researchers at Regeneron Pharmaceuticals have developed AdRoit, an accurate and robust method to infer the cell composition from transcriptome data of mixed cell types. AdRoit uses gene expression profiles obtained from single cell RNA sequencing as a reference. It employs an adaptive learning approach to alleviate the sequencing technique difference between the single cell and the bulk (or spatial) transcriptome data, enhancing cross-platform readout comparability. The systematic benchmarking and applications, which include deconvoluting complex mixtures that encompass 30 cell types, demonstrate its preferable sensitivity and specificity compared to many existing methods as well as its utilities. In addition, AdRoit is computationally efficient and runs orders of magnitude faster than most methods.
Schematic representation of AdRoit computational framework
a AdRoit inputs compound (bulk or spatial) RNA-Seq data, single-cell RNA-Seq data, and cell type annotations. It first selects informative genes and estimates their means and dispersions, then computes the cell type specificity of genes. Depending on the availability of multiple samples, cross-sample gene variability is derived from either the compound RNA-Seq, or the single-cell data (see also “Methods”). Lastly the gene-wise correction factors are computed to reduce the platform bias between the compound and the single-cell RNA-Seq data. These quantities are used in a weighted regularized model to infer the cell type composition. b A mock example to illustrate the role of the gene-wise correction factor. Conceptually, an accurate estimation of the cell proportions should be represented by the slope of the green line; however, fitting in the presence of outlier genes would result in the red line. Outlier genes exist because the platform bias affects genes differently. AdRoit adopts an adaptive learning approach that first learns a coarse estimation of the slope (red line), from which the gene-wise corrections are derived and applied to the outlier genes, moving them toward the green line. The more deviated the gene, the larger the correction (i.e., longer arrows). After the adjustment, the new estimated slope (blue line) is closer to the truth (green line) and thus is a more accurate estimation.