Single-cell RNA sequencing (scRNA-seq) has broad applications across biomedical research. One of the key challenges is to ensure that only single, live cells are included in downstream analysis, as the inclusion of compromised cells inevitably affects data interpretation.
Here, researchers from the EMBL-EBI present a generic approach for processing scRNA-seq data and detecting low quality cells, using a curated set of over 20 biological and technical features. Their approach improves classification accuracy by over 30 % compared to traditional methods when tested on over 5,000 cells, including CD4+ T cells, bone marrow dendritic cells, and mouse embryonic stem cells.
Overview of pipeline and quality control.
a Schematic of RNA sequencing workflow. Green indicates high and red low quality cells. b Schematic of the computational pipeline developed to process large numbers of cells and RNA sequencing reads. c Overview of quality control method. Gene expression data for 960 mES cells were used to extract biological and technical features capable of identifying low quality cells. These features and microscopy annotations served as training data for a classification algorithm that is capable of predicting low quality cells in other datasets. Additional annotation of deceptive cells as low quality helps to improve classification accuracy