Spatially resolved transcriptomics gets increasing attention during the recent years. New protocols based on in situ sequencing or multiplexed single-molecule fluorescent in situ hybridization (sm-FISH) aim at capturing physical positions of single molecules. It enables investigation of expression patterns in space, including new level of analysis of tissue composition, inference of cell-cell interactions or studying sub-cellular structure. However, capturing only molecule positions, these protocols does not allow to register cell boundaries to understand, which molecules came from the same cell.
To overcome the issue, researchers from Harvard Medical School developed a tool for pre-processing and analysis of spatial data, called Baysor. It allows to perform segmentation of molecules, inferring cell assignment from transcript information alone. They also introduced the concept of Neighborhood Composition Vectors, which allow to apply scRNA-seq pipelines to spatial data without performing cell segmentation. Finally, the authors described a Markov Random Field framework for labeling spatial data, which was then applied to filtration of background molecules and to transfer annotation from scRNA-seq data. The paper additionally discusses existing challenges in regards to the segmentation problem and how they could be overcome in the future.
The described method performs well across most of the existing protocols, including MERFISH, osm-FISH, ISS and STARmap. It recovers up to twice number of cells, compared to the staining-based segmentation methods, while also reducing expression contamination.