Dimensionality reduction and visualization play an important role in biological data analysis, such as data interpretation of single-cell RNA sequences (scRNA-seq). It is desired to have a visualization method that can not only be applicable to various application scenarios, including cell clustering and trajectory inference, but also satisfy a variety of technical requirements, especially the ability to preserve inherent structure of data and handle with batch effects. However, no existing methods can accommodate these requirements in a unified framework.
Zhejiang University researchers have developed a general visualization method, deep visualization (DV), that possesses the ability to preserve inherent structure of data and handle batch effects and is applicable to a variety of datasets from different application domains and dataset scales. The method embeds a given dataset into a 2- or 3-dimensional visualization space, with either a Euclidean or hyperbolic metric depending on a specified task type with type static (at a time point) or dynamic (at a sequence of time points) scRNA-seq data, respectively. Specifically, DV learns a structure graph to describe the relationships between data samples, transforms the data into visualization space while preserving the geometric structure of the data and correcting batch effects in an end-to-end manner. The experimental results on nine datasets in complex tissue from human patients or animal development demonstrate the competitiveness of DV in discovering complex cellular relations, uncovering temporal trajectories, and addressing complex batch factors. We also provide a preliminary attempt to pre-train a DV model for visualization of new incoming data.
The Deep Visualization (DV) model
a The DV framework. DV takes as input scRNA-seq measurements of multilevel technical or biological factors (e.g., replicate patient, disease) and learns the latent structure of cells while taking into consideration of batch effect. b DV learns a structure graph from the input based on local scale contraction, then in the process of preserving the geometric structure of scRNA-seq data, disentangles semantic visualization graph with batch effect into semantic visualization graph with batch-effect removed and priori batch effect graph. c The preprocessing modules for heterogeneous new datasets and the learned DV model are used for mapping new datasets to the reference.
Availability – The DV software package, implemented in Pyotrch, is available free from https://github.com/Westlake-AI/DV