Single-cell RNA-seq data allows insight into normal cellular function and various disease states through molecular characterization of gene expression on the single cell level. Dimensionality reduction of such high-dimensional data sets is essential for visualization and analysis, but single-cell RNA-seq data are challenging for classical dimensionality-reduction methods because of the prevalence of dropout events, which lead to zero-inflated data.
Researchers from the University of Oxford have developed a dimensionality-reduction method, (Z)ero (I)nflated (F)actor (A)nalysis (ZIFA), which explicitly models the dropout characteristics, and show that it improves modeling accuracy on simulated and biological data sets.
Zero-inflation in single-cell expression data. a Illustrative distribution of expression levels for three randomly chosen genes showing an abundance of single cells exhibiting null expression. b Heat maps showing the relationship between dropout rate and mean non-zero expression level for three published single-cell data sets including an approximate double exponential model fit. c Flow diagram illustrating the data generative process used by ZIFA. d Illustrative plot showing how different values of λ in the dropout-mean expression relationship (blue lines) can modulate the latent gene expression distribution to give a range of observed zero-inflated data.
Availability – A Python-based software implementation and source code are made freely available online via an MIT License: https://github.com/epierson9/ZIFA.