The importance of diversity and cellular specialization is clear for many reasons, from population-level diversification, to improved resiliency to unforeseen stresses, to unique functions within metazoan organisms during development and differentiation. However, the level of cellular heterogeneity is just now becoming clear through the integration of genome-wide analyses and more cost effective Next Generation Sequencing (NGS). With easy access to single-cell NGS (scNGS), new opportunities exist to examine different levels of gene expression and somatic mutational heterogeneity, but these assays can generate yottabyte scale data. Here, researchers at Weill Cornell Medical College model the importance of heterogeneity for large-scale analysis of scNGS data, with a focus on the utilization in oncology and other diseases, providing a guide to aid in sample size and experimental design.
Model of cells required for detection of variants
Minimum number of cells to sample to capture at least one (A) or three (B) subclone with varying probabilities (lines) across varying concentrations in a tissue with 1 billion cells. Hypergeometric calculations were done using R’s phyper() function with lower.tail = F and q = 0 (A) or two (B) across varying sample sizes and clonal frequencies such that m+n = 1,000,000,000.