Single-cell RNA sequencing (scRNA-seq) can be used to characterise differences in gene expression patterns between pre-specified populations of cells. Traditionally, differential expression tools are restricted to the study of changes in overall expression between cell populations. However, such analyses do not take full advantage of the rich information provided by scRNA-seq.
Researchers from the MRC Biostatistics Unit have developed a Bayesian hierarchical model which can be used to study changes in expression that lie beyond comparisons of means. In particular, their method can highlight genes that undergo changes in cell-to-cell heterogeneity between the populations but whose overall expression is preserved. Evidence supporting these changes is quantified using a probabilistic approach based on tail posterior probabilities, where a probability cut-off is calibrated through the expected false discovery rate. This method incorporates a built-in normalisation strategy and quantifies technical artefacts by borrowing information from technical spike-in genes. Control experiments validated the performance of the approach.
Graphical representation of the model for detecting changes in expression patterns (mean and over-dispersion) based on the comparison of two pre-defined population of cells.
The diagram considers expression counts of 2 genes (i: biological and i0: technical) and 2 cells (jp and j0p) from each population p = 1; 2. Observed expression counts are represented by square nodes. The central rhomboid node denotes the known input number of mRNA molecules for a technical gene i0, which is assumed to be constant across all cells. The remaining circular nodes represent unknown elements, using black to denote random effects and red to denote model parameters (fixed effects) that lie on the top of the model’s hierarchy. Here, ‑jp ’s and sjp ’s act as normalising constants that are cell-specific and p’s are global over-dispersion parameters capturing technical variability, which affects the expression counts of all genes and cells within each population. Finally, ip’s and ip’s respectively measure overall expression of a gene i and its residual biological cell-to-cell over-dispersion (after normalisation, technical noise removal and adjustment for overall expression) within each population. Coloured areas highlight elements that are shared within a gene and/or cell. The latter emphasises how our model borrows information across all cells to estimate parameters that are gene-specific and all genes to estimate parameters that are cell-specific.
Availability – This implementation is freely available as an R package, using a combination of R and C++ functions through the Rcpp library. This can be found in https://github.com/catavallejos/BASiCS