Researchers use scRNA-Seq and machine learning algorithms to ‘fingerprint’ human cells

Researchers say a new method to analyse data from individual human cells could be a step-change for diagnosing some of the most devastating diseases, including cancer and autoimmune disease.

By combining single cell analysis techniques with machine learning algorithms, a team led by researchers at the Garvan Institute of Medical Research has developed a method to ‘fingerprint’ human cells.

The method, called ‘scPred’, published in the journal Genome Biology, has the potential to allow earlier detection of cancer, identify the cells at the root of autoimmune disease, and help personalise treatments to individual patients.

“We’ve developed a new way to identify very specific types of cells, which has put us at the beginning of a significant new frontier in medical diagnostics,” says Associate Professor Joseph Powell, Director of the Garvan-Weizmann Centre for Cellular Genomics, who led the study and is now working to translate the method to diagnostic tests for clinical use.

A closer look at human cells

“For a long time we’ve mainly classified different cells in the human body based on a limited number of markers found on the cell surface or inside the cell. What we’re learning now is that underneath one ‘type’, there is a huge diversity of different cell types – for instance, even though different cancer cells could all have the same cell surface markers, only a subgroup of those cells may actually form a metastatic tumour,” explains Associate Professor Powell.

The researchers developed a new method of analysing transcripts of individual cells – a measure of which genes are active in different cells, which provides extensive information of what makes cells unique.

The team’s method scPred solved the challenge of determining what within the vast amounts of generated transcript data can provide the most useful information that defines a cell type.

“Our scPred method first collapses all the transcript data from a single cell – instead of trying to estimate 20,000 things at once it works out which patterns of those 20,000 have the most predictive power in distinguishing one cell type from another cell type.

scPred then ‘trains’ a statistical model on those patterns to test what features make a certain cell type ‘most different’ from another cell – which can be thought of as a unique fingerprint,” explains first author José Alquicira-Hernández, a PhD student at the University of Queensland.

Summary of the scPred method


a Training step. A gene expression matrix is eigendecomposed via singular value decomposition (SVD) to obtain orthonormal linear combinations of the gene expression values. Only PCs explaining greater than 0.01% of the variance of the dataset are considered for the feature selection and model training steps. Informative PCs are selected using a two-tailed Wilcoxon signed-rank test for each cell class distribution. The cells-PCs matrix is randomly split into k groups and the first k group is considered as a testing dataset for cross-validation. The remaining K-1 groups (shown as a single training fold) are used to train a machine learning classification model (a support vector machine). The model parameters are tuned, and each k group is used as a testing dataset to evaluate the prediction performance of a fi(x) model trained with the remaining K-1 groups. The best model in terms of prediction performance is selected. b Prediction step. The gene expression values of the cells from an independent test or validation dataset are projected onto the principal component basis from the training model, and the informative PCs are used to predict the class probabilities of each cell using the trained prediction model(s) fb(x)

A new dimension on diagnostics

Once a certain cell type has been ‘fingerprinted’, researchers can use the trained model to look for that same cell type in any other sample, in datasets from anywhere in the world.

The researchers have validated the scPred approach using datasets of colorectal cancer cells analysed by collaborators at Stanford University in the United States. Using scPred models, the researchers were able to identify cancer cells from a tissue sample with over 98% accuracy.

The researchers say their method adds enormous improvements in the resolution of cell types, and may uncover diseased cells that are outside the scope of current medical diagnostics.

Translation to patients

Thanks to advanced single cell sequencing methods, researchers can take snapshots of over 20,000 different pieces of information in a single cell’s transcript, and can do so for tens of thousands of cells at a time – the new method opens the technology to diagnostic applications, for the first time.

Through the Garvan-Weizmann Centre for Cellular Genomics, the researchers are now moving to the next phase of translating the method to accredited tests for clinical practice.

“Our scPred method gives us the possibility of earlier detection; it may allow us to determine the stage of a cancer patient, what potential drugs they will respond to, or whether their tumour cells have signatures that indicate resistance to chemotherapy. The potential for this new method is enormous,” says Associate Professor Powell.

Source – Garvan Institute for Medical Research

Alquicira-Hernandez J, Sathe A, Ji HP, Nguyen Q, Powell JE. (2019) scPred: accurate supervised method for cell-type classification from single-cell RNA-seq data. Gen Biol [Epub ahead of print]. [article]

Leave a Reply

Your email address will not be published. Required fields are marked *


Time limit is exhausted. Please reload CAPTCHA.