Single-cell RNA-sequencing (scRNA-seq) is a fast emerging technology allowing global transcriptome profiling on the single cell level. Cell type identification from scRNA-seq data is a critical task in a variety of research such as developmental biology, cell reprogramming, and cancers. Typically, cell type identification relies on human inspection using a combination of prior biological knowledge (e.g. marker genes and morphology) and computational techniques (e.g. PCA and clustering). Due to the incompleteness of our current knowledge and the subjectivity involved in this process, a small amount of cells may be subject to mislabelling.
Here, researchers from the University of Sydney propose a semi-supervised learning framework, named scReClassify, for ‘post hoc’ cell type identification from scRNA-seq datasets. Starting from an initial cell type annotation with potentially mislabelled cells, scReClassify first performs dimension reduction using PCA and next applies a semi-supervised learning method to learn and subsequently reclassify cells that are likely mislabelled initially to the most probable cell types. By using both simulated and real-world experimental datasets that profiled various tissues and biological systems, the researchers demonstrate that scReClassify is able to accurately identify and reclassify misclassified cells to their correct cell types.
An illustration of the scReClassify framework
a The initial cell type annotation. This is typically achieved by using a combination of biological knowledge and computational approach. b Correction of mislabelled cells using an AdaSampling procedure. The dimensionality of the gene expression matrix is first reduced by PCA and mislabelled cells are identified and reclassified to their correct cell types by using AdaSampling with either a support vector machine (SVM) or a random forest (RF) classifier or an ensemble of SVMs or RFs
Availability – It is implemented as an R package and is freely available from https://github.com/SydneyBioX/scReClassify.