Single-cell RNA-seq (scRNA-seq) data simulation is critical for evaluating computational methods for analysing scRNA-seq data especially when ground truth is experimentally unattainable. The reliability of evaluation depends on the ability of simulation methods to capture properties of experimental data. However, while many scRNA-seq data simulation methods have been proposed, a systematic evaluation of these methods is lacking.
University of Sydney researchers have developed a comprehensive evaluation framework, SimBench, including a novel kernel density estimation measure to benchmark 12 simulation methods through 36 scRNA-seq experimental datasets. They evaluated the simulation methods on a panel of data properties, ability to maintain biological signals and computational scalability. Their benchmark uncovered performance differences among the methods and highlighted the varying difficulties in simulating data characteristics. Furthermore, the researchers identified several limitations including maintaining heterogeneity of distribution. These results, together with the framework and datasets made publicly available as R packages, will guide simulation methods selection and their future development.
Schematic of the benchmarking workflow
a A total number of 36 datasets, covering a range of protocols, tissue types, organisms and sample size was used in this benchmark study. b We evaluated 12 simulation methods available in the literature to date. c Multiple aspects of evaluation were examined in this study, with the three primary focuses illustrated in detail in panel d. e Finally, we summarised the result into a set of recommendations for users and identified potential areas of improvement for developers.
Availability – The benchmark framework is available as an R package at https://github.com/SydneyBioX/SimBench.