Keynote: Probabilistic Gene Expression Signatures for Single Cell RNA-seq Data
Speaker: Rafael Irizarry (Dana-Farber Cancer Institute)
Single-cell RNA sequencing (scRNA-seq) quantifies the gene expression of individual cells in a sample, which allows distinct cell-type populations to be identified and characterized. An important step in many scRNA-seq analysis pipelines is the classification of cells into known cell-types. This motivates the development of data-driven cell-type identification methods. We find limitations with current approaches due to the reliance on known marker genes and sensitivity to the quality of reference samples. Here we present a computationally light statistical approach, based on Naive Bayes, that leverages public datasets to combine information across thousands of genes and probabilistically assign cell-type identity. Using datasets ranging across species and tissue types, we demonstrate that our approach is robust to low-quality reference data and produces more accurate cell-type identification than current methods. We also demonstrate how probability-based approaches can help identify cell types and spatial patterns when applied to spatial transcriptomic technologies measure gene expression data.