The advent of single-cell transcriptomics has made rare cell discovery a mainstream component in the downstream analysis pipeline. When the number of profiled cells are in the hundreds, even an outlier cell (singleton) deserves attention. But, the focus shifts to the discovery of minor cell types rather than mere singletons when profiled cells are in tens or hundreds of thousands. However, it is often difficult to find rare cells in voluminous datasets since major cell types influence expression variance in a data. As the number of profiled cells increases, state-of-the-art methods become excruciatingly slow. To overcome this, researchers from Indraprasth Institute of Technology, Delhi (IIITD) and Indian Institute of Technology Delhi (IITD) have brewed an efficient and fast algorithm to find rare cells, referred to as finder of rare entities (FiRE). Design of FiRE is inspired by the observation that rareness estimation of a particular data point is the flip side of measuring the density around it. In principle, FiRE uses the Sketching technique, a variant of locality sensitive hashing, to assign rareness score to every cell.
Overview of FiRE
The first step is to assign each cell to a hash code. A hash code can be considered as an imaginary bucket since multiple similar cells can share a hash code. For the robustness of rarity estimates, the hash code creation step is repeated for L times. For each cell i and estimator l, pil is computed as the probability for any point to land in the bucket of i. The second step of the algorithm involves combining these probabilities to obtain a rareness estimate for each cell.