Single-cell RNA sequencing (scRNA-seq) data allows us to quantify the biological heterogeneity in developmental processes at a fine-grained level. Despite recent advances in inferring cellular dynamics from the underlying developmental process, existing computational trajectory inference (TI) methods face several critical challenges. Firstly, most existing TI methods largely overlook the importance of dimensionality reduction by relying on simple dimension reduction techniques which may underfit the underlying complexity of the biological process and thus might obscure the identification of important intermediate cell states or small cell populations. Secondly, most existing methods impose strong assumptions on the topology of the trajectory and cannot generalize to disconnected or hybrid topologies without imposing further restrictions. Lastly, accurate detection of terminal cell states remains difficult as only a few methods can automatically identify cell fates.
To overcome the aforementioned challenges, researchers from IIT Kanpur developed MARGARET (Metric leARned Graph pARtitionEd Trajectory), a statistical analysis tool which provides an end-to-end framework that utilizes scRNA-seq data for inferring the cell state trajectory and dynamics of cell fate plasticity and thereby characterizes the differentiation landscape. Given initial cell embeddings and cluster assignments, MARGARET employs a neural-network-based encoder to generate lower-dimensional cell representations such that the distance between the embeddings of two cells belonging to the same cluster is minimized while the distance between the embeddings of two cells belonging to different clusters is maximized. To capture complex trajectory topologies, MARGARET employs the inferred cellular embeddings and the cell clusters to construct a cluster connectivity graph by using a novel measure of connectivity between cell clusters. The cluster connectivity graph is then used in conjunction with the cell-nearest-neighbour graph to compute a pseudotime ordering of cells. To identify terminal states in the trajectory, MARGARET introduces a shortest-path betweenness-based measure. Finally, MARGARET refines an existing absorbing Markov chain model of differentiation and introduces a local random walk-based novel algorithm for computing cell fate probabilities which in turn generalizes the quantification of the cell fate plasticity for complex trajectory topologies.
On a variety of synthetic datasets and real datasets consisting of multifurcating and disconnected trajectories (with complex multifurcating components), MARGARET outperformed other state-of-the-art TI methods on two main aspects of TI: accuracy in inferring the global topology and accuracy of pseudotime ordering. The authors also present the biological relevance of the method by inferring consistent trajectories underlying varying developmental processes, including human hematopoiesis, embryogenesis, and colon differentiation where MARGARET accurately identified all major lineages and associated gene expression trends and helped identify transitional progenitors associated with key branching events. Finally, MARGARET can also scale to large scRNA-seq datasets consisting of millions of cells.
Availability: https://github.com/Zafar-Lab/Margaret