Next-generation sequencing (NGS) technology has become a powerful tool for dissecting the molecular and pathological signatures of a variety of human diseases. However, the limited availability of biological samples from different disease stages is a major hurdle in studying disease progressions and identifying early pathological changes. Deep learning techniques have recently begun to be applied to analyze NGS data and thereby predict the progression of biological processes.
In this study, researchers from the Korea Brain Research Institute applied a deep learning technique called generative adversarial networks (GANs) to predict the molecular progress of Alzheimer’s disease (AD). They successfully applied GANs to analyze RNA-seq data from a 5xFAD mouse model of AD, which recapitulates major AD features of massive amyloid-β (Aβ) accumulation in the brain. They examined how the generator is featured to have specific-sample generation and biological gene association. Based on the above observations, the researchers suggested virtual disease progress by latent space interpolation to yield the transition curves of various genes with pathological changes from normal to AD state. By performing pathway analysis based on the transition curve patterns, they identified several pathological processes with progressive changes, such as inflammatory systems and synapse functions, which have previously been demonstrated to be involved in the pathogenesis of AD. Interestingly, their analysis indicates that alteration of cholesterol biosynthesis begins at a very early stage of AD, suggesting that it is the first effect to mediate the cholesterol metabolism of AD downstream of Aβ accumulation.
Overview of the application of the GANs to bulk RNA-seq data
RNA-seq analysis for the GSE104775 raw data with 36 WT and AD samples (n = 6/group) was performed, yielding 1,208 DEGs between 7M WT and 7M AD. The normalized expression profile for the 36 samples was subjected to a data augmentation procedure, creating 846 augmented samples. The generator network produces fake gene expression data with random variables in a latent space(z). The discriminator network distinguishes between the augmented real and fake data to yield a loss function applied to the training weight parameters of both networks. The transition curves for the 1,208 gene expressions change between WT and AD, showing a virtual simulation of disease progress. Then, these are evaluated by latent space interpolation with the generated fake data. The transition curves were classified into six patterns (P1 to P6) to perform pathway analysis with gene lists of pattern subsets. We identified the order of up- or downregulated pathways that predict the pathway cascades.