Advances in single-cell RNA-sequencing technology have resulted in a wealth of studies aiming to identify transcriptomic cell types in various biological systems. There are multiple experimental approaches to isolate and profile single cells, which provide different levels of cellular and tissue coverage. In addition, multiple computational strategies have been proposed to identify putative cell types from single-cell data. From a data generation perspective, recent single-cell studies can be classified into two groups: those that distribute reads shallowly over large numbers of cells and those that distribute reads more deeply over a smaller cell population. Although there are advantages to both approaches in terms of cellular and tissue coverage, it is unclear whether different computational cell type identification methods are better suited to one or the other experimental paradigm. Researchers from the Howard Hughes Medical Institute review three cell type clustering algorithms, each representing one of three broad approaches, and finds that PCA-based algorithms appear most suited to low read depth data sets, whereas gene clustering-based and biclustering algorithms perform better on high read depth data sets. In addition, highly related cell classes are better distinguished by higher-depth data, given the same total number of reads; however, simultaneous discovery of distinct and similar types is better served by lower-depth, higher cell number data. Overall, this study suggests that the depth of profiling should be determined by initial assumptions about the diversity of cells in the population, and that the selection of clustering algorithm(s) is subsequently based on the depth of profiling will allow for better identification of putative transcriptomic cell types.
Distributing reads over cells
(A) Given a population of cells and a total number of reads available, reads can either be used to sequence fewer cells more deeply (right) or to sequence more cells at a shallower depth (left). Here, cell type identity and transcript species are indicated by different colors. Using the strategy on the left, there may not be enough cells sampled of a given type (pink) to identify a cluster. Using the strategy on the right, cells of a given type may not share enough transcriptional similarity to be identified as belonging to the same cluster. (B) Cell numbers and read depths for single-cell RNA-seq studies with the goal of identifying transcriptomic cell types. For certain studies, mapped read counts were not clearly stated, so overall read counts are reported—the number of reads with useful information in these studies is less than shown on the graph.