Most existing dimensionality reduction and clustering packages for single-cell RNA-Seq (scRNA-Seq) data deal with dropouts by heavy modelling and computational machinery. Here researchers from the Victor Chang Cardiac Research Institute introduce CIDR (Clustering through Imputation and Dimensionality Reduction), an ultrafast algorithm which uses a novel yet very simple ‘implicit imputation’ approach to alleviate the impact of dropouts in scRNA-Seq data in a principled manner. Using a range of simulated and real data, they have shown that CIDR improves the standard principal component analysis and outperforms the state-of-the-art methods, namely t-SNE, ZIFA and RaceID, in terms of clustering accuracy. CIDR typically completes within seconds for processing a data set of hundreds of cells, and minutes for a data set of thousands of cells.
A toy example to illustrate the effect of dropout in scRNA-Seq data
on clustering and how CIDR can alleviate the effect of dropouts
(a) This toy example consists of eight single cells divided into two clusters (the red cluster and blue cluster. Dropout causes the within-cluster distances among the single cells in the red cluster to increase dramatically, as well as increasing the between cluster distances between single cells in the two clusters. (b) CIDR reduces the dropoutinduced within-cluster distances while largely maintains the between-cluster distances. (c) The hierarchical clustering results using the original data set (no dropout), the dropout-affected data set, and the dropout-affected data set analysed using CIDR.
Availability – CIDR can be downloaded at https://github.org/VCCRI/CIDR