Gene trajectory inference for single-cell data by optimal transport metrics

Understanding the dynamics of gene expression is crucial for unraveling the mysteries of biological processes. Single-cell RNA sequencing (scRNA-seq) has emerged as a powerful tool for exploring these dynamics, offering insights into cell state transitions and gene behaviors.

However, interpreting scRNA-seq data presents its own set of challenges. Current methods often rely on inferring cell trajectories to track changes in gene expression over time. Yet, this approach can be limited by the presence of multiple concurrent gene processes within the same group of cells, as well as technical noise that may obscure the true progression of biological processes.

To overcome these obstacles, researchers  at Yale University have developed a groundbreaking approach known as GeneTrajectory. Unlike traditional methods that focus on inferring cell trajectories, GeneTrajectory identifies trajectories of genes themselves. By calculating optimal transport distances between gene distributions across a cell-cell graph, GeneTrajectory extracts gene programs and defines their pseudotemporal order.

Overview of GeneTrajectory

Fig. 1

a, Illustration of two scenarios when a linear process and a cyclic process are dependent or independent of each other, resulting in cell manifolds with different intrinsic dimensions and requiring distinct pseudotime parametrizations. b, Schematic representation of the major workflow of GeneTrajectory. c, Construction of cell kNN graph. d, Computation of graph-based OT (Wasserstein) distances between paired gene distributions (four representative genes are shown) over the cell graph. Gene distributions are defined by their normalized expression levels over cells. e, Heatmap of OT (Wasserstein) distances for genes g1–g4 in df, Construction of gene graph based on gene–gene affinities (transformed from gene–gene Wasserstein distances). g, Sequential identification of gene trajectories using a diffusion-based strategy. The initial node (terminus 1) is defined by the gene with the largest distance from the origin in the diffusion map embedding. A random-walk procedure is then used on the gene graph to select the other genes that belong to the trajectory terminated at terminus 1. After retrieving genes for the first trajectory, we identify the terminus of the subsequent gene trajectory among the remaining genes and repeat the steps above. This is done iteratively until all detectable trajectories are extracted. h, Diffusion map visualization of gene trajectories.

In a recent study, GeneTrajectory showcased its remarkable accuracy in extracting progressive gene dynamics during myeloid lineage maturation—a process crucial for the development of immune cells. Furthermore, the researchers demonstrated the power of GeneTrajectory in deciphering key gene programs underlying mouse skin hair follicle dermal condensate differentiation—a feat previously unattainable with traditional cell trajectory approaches.

By shining a spotlight on gene dynamics, GeneTrajectory opens new doors for researchers, facilitating the discovery of gene programs that drive changes and activities in biological processes. This innovative approach promises to deepen our understanding of cellular biology and pave the way for exciting advancements in fields ranging from developmental biology to regenerative medicine.


Qu R, Cheng X, Sefik E, Stanley Iii JS, Landa B, Strino F, Platt S, Garritano J, Odell ID, Coifman R, Flavell RA, Myung P, Kluger Y. (2024) Gene trajectory inference for single-cell data by optimal transport metrics. Nat Biotechnol [Epub ahead of print]. [article]

Leave a Reply

Your email address will not be published. Required fields are marked *


Time limit is exhausted. Please reload CAPTCHA.