Recent technical improvements in single-cell RNA sequencing (scRNA-seq) have enabled massively parallel profiling of transcriptomes, thereby promoting large-scale studies encompassing a wide range of cell types of multicellular organisms.
Researchers from the RIKEN Center for Biosystems Dynamics Research have developed CellFishing.jl, a new method for searching atlas-scale datasets for similar cells and detecting noteworthy genes of query cells with high accuracy and throughput. Using multiple scRNA-seq datasets, the researchers validate that their method demonstrates comparable accuracy to and is markedly faster than the state-of-the-art software. Moreover, CellFishing.jl is scalable to more than one million cells, and the throughput of the search is approximately 1600 cells per second.
Schematic workflow of CellFishing.jl
CellFishing.jl first builds a database (DB) object that stores data preprocessors, indexed bit vectors, and cell metadata, if provided. The metadata can store any information including cell names, cell types, and transcript expressions of marker genes. When building a database, the DGE matrix of reference cells is preprocessed to extract important signals and then hashed into bit vectors by LSH. The preprocessors and the indexed bit vectors are stored in the database object. M, D, and T on the left side of the figure refer to the number of genes, number of reduced dimensions, and length of the bit vectors, respectively. N and N′ above the two DGE matrices represent the number of cells within the reference and query data, respectively. While searching the database for similar cells, the prebuilt preprocessors stored in the database are reused in a similar workflow that is involved in database building up to the hashing phase. The database object can be saved onto a disk and can be loaded from there
Availability – CellFishing.jl is implemented in the Julia programming language, and the source code is freely available under the MIT license at https://github.com/bicycle1885/CellFishing.jl