Single-cell RNA sequencing (scRNA-seq) technologies allow researchers to uncover the biological states of a single cell at high resolution. For computational efficiency and easy visualization, dimensionality reduction is necessary to capture gene expression patterns in low-dimensional space. Researchers from the University of Arizona propose an ensemble method for simultaneous dimensionality reduction and feature gene extraction (EDGE) of scRNA-seq data. Different from existing dimensionality reduction techniques, the proposed method implements an ensemble learning scheme that utilizes massive weak learners for an accurate similarity search. Based on the similarity matrix constructed by those weak learners, the low-dimensional embedding of the data is estimated and optimized through spectral embedding and stochastic gradient descent. Comprehensive simulation and empirical studies show that EDGE is well suited for searching for meaningful organization of cells, detecting rare cell types, and identifying essential feature genes associated with certain cell types.
Overview of EDGE
The algorithm starts by generating a number of weak learners. Each weak learner consists of a few hash codes (imaginary boxes, i.e., piles of cells). For cells assigned to the same hash code, their pairwise similarity scores are set to be 1s and 0s otherwise. Each weak learner is a voter. The final similarity probabilities between cells are calculated by averaging the corresponding similarity scores from each voter. The calculated probabilities are used in embedding estimation and optimization. The important scores of genes in each weak learner are obtained by averaging the hash codes’ entropy values. Details of the algorithm can be found in Methods and Supplementary Information.
Availability – the EDGE R package is freely available on GitHub (https://github.com/shawnstat/EDGE).