The advent of single-cell RNA sequencing (scRNA-seq) has revolutionized our understanding of cellular heterogeneity and complexity in biological tissues. However, the nature of large, sparse scRNA-seq datasets and privacy regulations present challenges for efficient cell identification. Federated learning provides a solution, allowing efficient and private data use.
Researchers at Sichuan University have developed scFed, a unified federated learning framework that allows for benchmarking of four classification algorithms without violating data privacy, including single-cell-specific and general-purpose classifiers. The researchers evaluated scFed using eight publicly available scRNA-seq datasets with diverse sizes, species and technologies, assessing its performance via intra-dataset and inter-dataset experimental setups. They found that scFed performs well on a variety of datasets with competitive accuracy to centralized models. Though Transformer-based model excels in centralized training, its performance slightly lags behind single-cell-specific model within the scFed framework, coupled with a notable time complexity concern. This study not only helps select suitable cell identification methods but also highlights federated learning’s potential for privacy-preserving, collaborative biomedical research.
The workflow of scFed: clients use local gene expression data from scRNA-seq to train local models; the local models are used to update the global model. The aggregated global model is passed to the local models for further training.
Availability – The current version of scFed is implemented in python and can be found at https://github.com/digi2002/federatedSinglecell.