Cellular lineage trees can be derived from single-cell RNA sequencing snapshots of differentiating cells. Currently, only datasets with simple topologies are available. To test and further develop tools for lineage tree reconstruction, we need test datasets with known complex topologies. Researchers at Max Planck Institute for Biophysical Chemistry have developed PROSSTT which can simulate scRNA-seq datasets for differentiation processes with lineage trees of any desired complexity, noise level, noise model, and size. PROSSTT also provides scripts to quantify the quality of predicted lineage trees.
PROSSTT models the single-cell RNA-seq transcriptomes of cells differentiating along a (user supplied or sampled) lineage tree
(A) A small number of gene expression programs is simulated by random walk along each of the tree branches (number of steps = integer branch length). Here, a double bifurcation is regulated by thee expression programs . (B) Relative expected gene expression μ g ( t,b ) is computed as weighted sum of the expression programs with randomly sampled weights (here: gene g in branch 3). Expected expression values are obtained by multiplying with a gene-dependent sampled scaling factor. (C) Cells are sampled from the tree as pairs of pseudotime t and branch b . For each pair, the corresponding average gene expression is retrieved and UMI counts sampled using a negative binomial distribution. Low-dimensional representations of the resulting gene expression matrix are similar to those of real data (section 1, Supplementary Material) and capture the lineage tree topology (diffusion map created with destiny, (Angerer et al. , 2016)).