Several studies profile similar single cell RNA-Seq (scRNA-Seq) data using different technologies and platforms. A number of alignment methods have been developed to enable the integration and comparison of scRNA-Seq data from such studies. While each performs well on some of the datasets, to date no method was able to both perform the alignment using the original expression space and generalize to new data.
To enable such analysis researchers at Carnegie Mellon University have developed Single Cell Iterative Point set Registration (SCIPR) which extends methods that were successfully applied to align image data to scRNA-Seq. The researchers discuss the required changes needed, the resulting optimization function, and algorithms for learning a transformation function for aligning data. They tested SCIPR on several scRNA-Seq datasets. As they show it successfully aligns data from several different cell types, improving upon prior methods proposed for this task. In addition, the researchers show the parameters learned by SCIPR can be used to align data not used in the training and to identify key cell type-specific genes.
Summary of steps in iterative point set registration for scRNA-seq data
Each cell in an scRNA-seq dataset can be viewed as a point in high dimensional space. 1) We start with two unaligned batches (sources, blue and targets, orange). 2) A matching algorithm (e.g. picking the closest corresponding point, or using mutual nearest neighbors) is used to pair source cells from A with a corresponding target cell in B. The number of source and/or target cells matched can vary for different matching strategies. 3) Based on the selected pairs, a global transformation function is learned so that source cells in A become closer to their paired cell in B. 4) The learned transformation is next applied to all points in A. 5) This process (steps 2–4) is repeated, iteratively aligning set A onto B until the mean distance between the assigned pairs of cells no longer improves. 6) The final global transformation function is the composition of the functions learned in each iteration at step 3.