Most existing dimensionality reduction and clustering packages for single cell RNA-Seq (scRNASeq) data deal with dropouts by heavy modelling and computational machinery. Here researchers from the Victor Chang Cardiac Research Institute introduce CIDR (Clustering through Imputation and Dimensionality Reduction), an ultrafast algorithm which uses a novel yet very simple “implicit imputation” approach to alleviate the impact of dropouts in scRNA-Seq data in a principled manner. Using a range of simulated and real data, the researchers have shown that CIDR outperforms the state-of-the-art methods, namely t-SNE, ZIFA and RaceID, by at least 50% in terms of clustering accuracy, and typically completes within seconds for processing a dataset of hundreds of cells.
Performance evaluation with the human brain scRNA-Seq dataset
In this dataset there are 420 cells in 8 cell types after the exclusion of hybrid cells. The different colors denote the cell types annotated by the study11; while the different plotting symbols denote the clusters output by each algorithm. (a) – (e) Clustering output by each of the five compared algorithms; (f) Adjusted Rand Index is used to measure the accuracy of the clustering output by each of the compared algorithms.
Availability – The package CIDR can be downloaded at http://github.com/VCCRI/CIDR