Age-associated deterioration of cellular physiology leads to pathological conditions. The ability to detect premature aging could provide a window for preventive therapies against age-related diseases. However, the techniques for determining cellular age are limited, as they rely on a limited set of histological markers and lack predictive power. Here, a team led by researchers at TU Dresden implement GERAS (GEnetic Reference for Age of Single-cell), a machine learning based framework capable of assigning individual cells to chronological stages based on their transcriptomes. GERAS displays greater than 90% accuracy in classifying the chronological stage of zebrafish and human pancreatic cells. The framework demonstrates robustness against biological and technical noise, as evaluated by its performance on independent samplings of single-cells. Additionally, GERAS determines the impact of differences in calorie intake and BMI on the aging of zebrafish and human pancreatic cells, respectively. The researchers further harness the classification ability of GERAS to identify molecular factors that are potentially associated with the aging of beta-cells. They show that one of these factors, junba, is necessary to maintain the proliferative state of juvenile beta-cells. These results showcase the applicability of a machine learning framework to classify the chronological stage of heterogeneous cell populations, while enabling detection of candidate genes associated with aging.
A Chronological age classifier for zebrafish beta-cells
(a) Schematic of the machine learning framework for classifying the chronological stage of zebrafish beta-cells based on single-cell transcriptome (see Online Methods for details). (b) Barplot showing the accuracy of GERAS for classifying the ages of beta-cells that were excluded during the training of the model. The classification of the excluded beta-cells displayed greater than 91% accuracy. Error bars indicate standard error. The F1-score for each stage is displayed at the bottom. The F1-score is a metric evaluating the precision and the sensitivity of the classifier, with the highest being 1, and the lowest being 0. (c) Balloonplots showing the age-classification of de-novo sequenced beta-cells. GERAS classified the age of the cells from independent sources with greater than or equal to 92% accuracy, showcasing the robustness of the model in handling biological and technical noise. (d) Balloonplots showing the age-classification of beta-cells from 3 mpf animals sequenced using the Fluidigm C1 platform. GERAS classified the age of the cells from the cohort with 92.3% accuracy, demonstrating the robustness of the model in handling alternative sequencing pipelines. (e) The capacity of GERAS to perform interpolation was tested using cells with ages in-between the chronological stages used to train GERAS. More than 97% of the cells from the intermediate time-points classify in the nearest-neighbor stages. Number of cells for each condition is denoted by ‘n’.
Availability – Normalized read-counts for all human pancreatic samples used in the study, and codes for developing and testing GERAS are available on Github: https://github.com/sumeetpalsingh/GERAS2017