The excessive amount of zeros in single-cell RNA-seq data include “real” zeros due to the on-off nature of gene transcription in single cells and “dropout” zeros due to technical reasons. Existing differential expression (DE) analysis methods cannot distinguish these two types of zeros. Tsinghua University researchers developed an R package DEsingle which employed Zero-Inflated Negative Binomial model to estimate the proportion of real and dropout zeros and to define and detect 3 types of DE genes in single-cell RNA-seq data with higher accuracy.
ZINB model for scRNA – seq data and workflow of DEsingle
(A) Histogram of zero percentages of all expressed genes in a real scRNA – seq dataset. (B) An example of ZINB model fitting for scRNA – seq data. The left panel shows the density fitting. The right p anel shows the cumulative distribution function fitting. (C) The mRNA capture procedure transforms the original ZINB model to another ZINB model with only one parameter changed. (D) Theoretical ZINB distribution of a gene with different random capture effi ciency b. (E) Workflow of DEsingle to detect and classify DE genes
Availability: The R package DEsingle is freely available at https://github.com/miaozhun/DEsingle and is under Bioconductor’s consideration now.