Repeated cross-sectional time series single cell data confound several sources of variation, with contributions from measurement noise, stochastic cell-to-cell variation and cell progression at different rates. Time series from single cell assays are particularly susceptible to confounding as the measurements are not averaged over populations of cells. When several genes are assayed in parallel these effects can be estimated and corrected for under certain smoothness assumptions on cell progression.
Researchers from the MRC, Cambridge have developed a principled probabilistic model with a Bayesian inference scheme to analyse such data. They demonstrate the method’s utility on public microarray, nCounter and RNA-seq data sets from three organisms. This method almost perfectly recovers withheld capture times in an Arabidopsis data set, it accurately estimates cell cycle peak times in a human prostate cancer cell line and it correctly identifies two precocious cells in a study of paracrine signalling in mouse dendritic cells. Furthermore, this method compares favourably with Monocle, a state-of-the-art technique. The researchers also show using held-out data that uncertainty in the temporal dimension is a common confounder and should be accounted for in analyses of repeated cross-sectional time series.
The module score (as defined by Shalek et al.) of core antiviral genes over pseudotime. The two precocious cells captured at one hour are plotted as triangles. These two cells have been placed at a later pseudotime than the other cells captured at one hour. A Loess curve has also been plotted through the data.
Availability – The method is available on CRAN in the DeLorean package.
Contact – [email protected]