Biomedical scientists are increasingly using deconvolution methods, those used to computationally analyze the composition of complex mixtures of cells. One of their challenges is to select one method that is appropriate for their experimental conditions among nearly 50 available.
To help with method selection, researchers at Baylor College of Medicine and the Jan and Dan Duncan Neurological Research Institute at Texas Children’s Hospital have extensively evaluated 11 deconvolution methods that are based on RNA-sequencing (RNA-seq) data analysis, determining each method’s individual strengths and weaknesses in a variety of scenarios. From these analyses, the researchers derived guidelines that scientists can use to determine the deconvolution method that optimally fits their needs. The study appears in the journal Genome Biology.
“A great deal of work in biomedical research involves analyzing heterogeneous biological tissues to gain insight into the contribution of individual cells in, for instance, cancer growth or brain development,” said corresponding author Dr. Zhandong Liu, associate professor of pediatrics-neurology at Baylor and director of the Bioinformatics Core of the Jan and Dan Duncan Neurological Research Institute.
Analyzing complex cellular mixtures is a difficult task. Researchers can conduct such analyses with laboratory techniques that physically separate and/or identify cellular components, but this method is time consuming and expensive.
Alternatively, researchers can use deconvolution methods that computationally extract information about individual cells in a mixture by analyzing large datasets derived from the bulk, such as RNA sequencing data.
For example, some researchers studying stem cells, a rare type of cell, might be interested in the percentage of these cells in the total blood cell population. They could conduct RNA-seq analysis of the bulk of cells and then apply a deconvolution method to determine the percentage of stem cells in the mixture. But, what method should they use?
In another example, if a scientist were interested only in the relative proportions of different cell types in a mixture, then one method would be best for deconvolution. But if the scientist wanted to find out the actual percentage of each cell type, then that deconvolution method would not be the best for that job, but another one that works better at providing that kind of answer. How can a scientist know which method works best in each situation?
“Our lab is one of many that developed deconvolution methods early on, contributing to the nearly 50 deconvolution methods currently out there to do this type of job,” said first author Haijing Jin, a graduate student in Baylor’s graduate program of quantitative and computational biosciences working in the Liu lab. “The methods are based on different mathematical models and/or different assumptions to try to solve deconvolution problems, which involve basically how to go from a bulk heterogeneous tissue to profiles of individual cells.”
Because of this growing interest in deconvolution and the abundance of methods available, Liu and Jin felt that it was time to establish a guideline or benchmark to understand the strengths and the weaknesses of each method.
Running thousands of scenarios
The team studied 11 methods. They selected them according to the quality of the programing, the number of citations in the scientific literature and their popularity in the field.
“One of the challenges we faced was how to best test the strengths and weaknesses of each method in many possible scenarios,” Jin said.
Overview of three in silico testing frameworks
a Three benchmarking frameworks were constructed to investigate the impact of seven factors that affect deconvolution analysis: noise level, noise structure, other noise sources, quantification unit, unknown content, component number, and weight matrix. b Eleven deconvolution methods are tested and have been categorized based on the required reference input: marker-based, reference-based, and reference-free. c Performance of the methods is assessed through Pearson’s correlation coefficient (R) and mean absolute deviance (mAD). Evaluation results are illustrated by heatmaps and scatter plots. When unknown content is involved, we derive evaluation metrics in both relative and absolute measurement scales
The researchers decided to use a computational or in silico approach that enabled them to simulate the thousands of scenarios necessary to test all the methods.
“All these scenarios represented real-life experimental situations in cell research, cancer research or developmental biology. We simulated each one of them so we could identify the best deconvolution method for each scenario for people who are interested in applying these methods to their experiments,” Jin said.
“That’s the value of this work,” Liu said. “We are providing a benchmark study on various deconvolution methods and guidance for people working on different topics in biology to facilitate the analysis of their experimental results.”
Source – Baylor College of Medicine