To find marker genes in single-cell RNA-seq data, many popular methods now follow a common approach to find differentially expressed genes between a small group of highly homogeneous cells and the rest of the data (the outside group), and assume a specific type of distribution on the gene expression (eg.: SeuratPoisson, Seuratnegbinom, SeuratT, CellRanger, EdgeR and limmatrend).
However, single-cell data are highly heterogeneous. This means the outside group can have multiple separated clusters, making the assumption that the gene expressions of the outside group belong to a specific distribution invalid. In addition, the current methods conventionally define differentially expressed genes as genes with different mean expression values. For a cluster of transitional-state cells, we cannot use this definition to detect transitional marker genes, the expression of which is intermediate in the cluster of interest, but is both up-regulated and down-regulated in the remaining clusters of the data (making the mean expression values between the cluster of interest and the outside group indistinguishable).
Here we introduce Venice, a non-parametric test for finding marker genes in single-cell RNA-seq data that addresses the problems in the current methods. Venice could effectively detect different types of marker genes, including transitional markers, while keeping the running time minimal.
Using a widely adopted benchmarking approach (Wang et al. 2019), Venice obtains the best accuracy among 14 other tools and is also the fastest tool of all the benchmarked methods.
For preliminary benchmark, please visit our blog.
Venice is open-source, and freely available for academic research purposes. The method is now incorporated in Signac, a single-cell analytics package developed by BioTuring, available at: