A new statistical method allows researchers to infer different developmental processes from RNA-Seq data

Through RNA sequencing, researchers can measure which genes are expressed in each individual cell of a sample. A new statistical method allows researchers to infer different developmental processes from a cell mixture consisting of asynchronous stages. This finding has been published by researchers of Helmholtz Zentrum München in collaboration with colleagues from Technical University Munich in the journal ‘Nature Methods’.

Today, cell biology no longer focuses only on static states, but rather seeks to understand the dynamic development of cells. One example for this is the formation of various types of blood cells, such as red blood cells or endothelial cells from their precursors, the blood stem cells. To understand how this process is genetically controlled, scientists analyze which genes are expressed by means of transcriptome analysis.

“To me, it’s still amazing that we are now even able to determine the transcriptome of single cells,” said lead author Laleh Haghverdi, “especially when one realizes that a typical cell contains only a few picograms of RNA*.” The availability of these data is now beginning to revolutionize many fields of research, but new statistical methods are required to interpret these correctly. “For example, all cells of a sample never start their development synchronously, and their development takes different lengths of time. Therefore, we are always dealing with a dynamic mixture,” added Haghverdi, doctoral student at the Institute of Computational Biology (ICB) at Helmholtz Zentrum München. “It is immensely difficult to construct multiple steps of a process from this, especially since the cells are only available for one measurement.”

Welcome to the era of pseudotime

To decrypt developmental processes from the measurement of a single time point, quasi a snapshot measurement, the researchers led by ICB Director Prof. Dr. Dr. Fabian Theis developed an algorithm called diffusion pseudotime to interpret single cell sequencing data. This algorithm orders cells on a virtual timeline – the pseudotime – along which they show continuous changes in the transcriptome. Thus, it can be reconstructed which genes are expressed sequentially. By means of this method, researchers can graphically display the branching lineages of the developmental paths of different cell types.

“For example, we can show how a relatively uniform cluster of blood stem cells develops into different cell types,” said study leader Theis. “While some become red blood cells, others differentiate into endothelial cells. We can trace these fates based on the transcriptome data of the single cells.” In addition, the scientists obtain information about which gene switches underlie the developments. The relatively diffuse mixture of cells which were found to be at different stages of their development can be disentangled on the computer and, after the analysis, provides a clear picture of the ongoing individual steps.

However, this is only the beginning for the researchers because the processes of blood formation are relatively well understood. They served only as a test object to determine how well the method works. “In the future we want to focus on processes that have remained elusive until now or which may not have been discovered at all,” said Theis.**

Diffusion pseudotime reveals temporal ordering and cellular decisions on the single cell level


(a) The diffusion transition matrix Txy is constructed by computing the overlap of local kernels at the expression levels of cells x and y (1). Diffusion pseudotime dpt(x,y) approximates the geodesic distance between x and y on the mapped manifold (2). Branching points are identified as points where anticorrelated distances from branch ends become correlated (3). (b) Application of DPT to single-cell qPCR of 42 genes in 3,934 single cells during early hematopoiesis13, sorted from primitive streak (PS), neural plate (NP), head fold (HF), four somite GFP negative (4SG−) and four somite GFP positive (4SG+). DPT identifies the endothelial branch 1 (4SG) and the erythroid branch 2 (4SG+) (blue cells in bottom graphs). (c) Dynamics of genes Erg and Ikaros in both branches. Black lines show the moving average over 50 adjacent cells. The red vertical line depicts the branching point. (d) Heatmap of gene expression (smoothed over 50 adjacent cells), with cells ordered by DPT and branching and genes ordered according to first major change, which is indicated by black triangles (upward: activation, downward: deactivation). Pie charts (bottom) show the fraction of cells in the four metastable states (metastable state populations are high-density DPT regions indicated by the black horizontal line above the pie charts).

In collaboration with experimental institutes at Helmholtz Zentrum München, the scientists are focusing on the development of brain cells and the insulin-producing beta cells in the pancreas, among other research projects. They hope that by elucidating the formation of individual cell groups, they will develop approaches to intervene in these processes – for example, when they are disturbed due to disease.

Haghverdi, L. et al. (2016) Diffusion pseudotime robustly reconstructs lineage branching. Nature Methods [Epub ahead of print]. [abstract]

Source – Helmholtz Zentrum München

Leave a Reply

Your email address will not be published. Required fields are marked *


Time limit is exhausted. Please reload CAPTCHA.