Traditional differential expression tools are limited to detecting changes in overall expression, and fail to uncover the rich information provided by single-cell level data sets. Researchers at the Cambridge Institute of Public Health have devloped a Bayesian hierarchical model that builds upon BASiCS to study changes that lie beyond comparisons of means, incorporating built-in normalization and quantifying technical artifacts by borrowing information from spike-in genes. Using a probabilistic approach, they highlight genes undergoing changes in cell-to-cell heterogeneity but whose overall expression remains unchanged.
Graphical representation of our model for detecting changes in expression patterns (mean and over-dispersion) based on comparing two predefined population of cells.
The diagram considers expression counts of two genes (i is biological and i ′ is technical) and two cells (j p and j′p) from each population p=1,2. Observed expression counts are represented by square nodes. The central rhomboid node denotes the known input number of mRNA molecules for a technical gene i ′, which is assumed to be constant across all cells. The remaining circular nodes represent unknown elements, using black to denote random effects and red to denote model parameters (fixed effects) that lie on the top of the model’s hierarchy. Here, ϕ(p)j’s and s(p)j’s act as normalizing constants that are cell-specific and θ p’s are global over-dispersion parameters capturing technical variability, which affect the expression counts of all genes and cells within each population. In this diagram, ν(p)j’s and ρ(p)ij’s represent random effects related to technical and biological variability components, whose variability is controlled by θ p’s and δ(p)i’s, respectively (see Additional file 1: Note 6.1). Finally, μ(p)i’s and δ(p)i’s, respectively, measure the overall expression of a gene i and its residual biological cell-to-cell over-dispersion (after normalization, technical noise removal and adjustment for overall expression) within each population. Colored areas highlight elements that are shared within a gene and/or cell. The latter emphasizes how our model borrows information across all cells to estimate parameters that are gene-specific and all genes to estimate parameters that are cell-specific.
Control experiments validate this method’s performance and a case study suggests that novel biological insights can be revealed.
Availability – This method is implemented in R and available at https://github.com/catavallejos/BASiCS