Cell types in cell populations change as the condition changes: some cell types die out, new cell types may emerge and surviving cell types evolve to adapt to the new condition. Using single-cell RNA-sequencing data that measure the gene expression of cells before and after the condition change, researchers from the University of Notre Dame propose an algorithm, SparseDC, which identifies cell types, traces their changes across conditions and identifies genes which are marker genes for these changes. By solving a unified optimization problem, SparseDC completes all three tasks simultaneously. SparseDC is highly computationally efficient and demonstrates its accuracy on both simulated and real data.
Heatmaps of the expression of the top 10 upregulated housekeeping marker genes
detected by SparseDC for the Llorens–Bobadilla data.
The top 10 housekeeping marker genes are identified as the 10 genes which have the largest positive center value, μik, in both conditions, ischemic injured (A) and naive (B). The color bars at the top represent the clusters of the cells, while the color bars at the side represent the marker genes for each cluster. The numbers on the plot correspond to the clusters found in the data, where cluster 1 contains the likely qNSC cells, cluster 2 contains the likely oligodendrocyte cells, cluster 3 contains the likely aNSC cells and cluster 4 contains the likely neuroblast cells. For all of the cell clusters there are clear blocks relating to the marker genes for the cluster.
Availability – SparseDC has been implemented in R and is available as an R package from CRAN (‘https://cran.r-project.org/web/packages/SparseDC/index.html’). A vignette is also available at ‘https://cran.r-project.org/web/packages/SparseDC/vignettes/SparseDC.html’.