Previous studies have prioritized trait-relevant cell types by looking for an enrichment of genome-wide association study (GWAS) signal within functional regions. However, these studies are limited in cell resolution by the lack of functional annotations from difficult-to-characterize or rare cell populations. Measurement of single-cell gene expression has become a popular method for characterizing novel cell types, and yet limited work has linked single-cell RNA sequencing (RNA-seq) to phenotypes of interest. To address this deficiency, Stanford University researchers present RolyPoly, a regression-based polygenic model that can prioritize trait-relevant cell types and genes from GWAS summary statistics and gene expression data. RolyPoly is designed to use expression data from either bulk tissue or single-cell RNA-seq.
In this study, the researchers demonstrated RolyPoly’s accuracy through simulation and validated previously known tissue-trait associations. They discovered a significant association between microglia and late-onset Alzheimer disease and an association between schizophrenia and oligodendrocytes and replicating fetal cortical cells. Additionally, RolyPoly computes a trait-relevance score for each gene to reflect the importance of expression specific to a cell type. The researchers found that differentially expressed genes in the prefrontal cortex of individuals with Alzheimer disease were significantly enriched with genes ranked highly by RolyPoly gene scores. Overall, this method represents a powerful framework for understanding the effect of common variants on cell types contributing to complex traits.
RolyPoly Detects Trait-Associated Annotations by Using
GWAS Summary Statistics and Gene Expression Profiles
(A) We model the variance of GWAS effect sizes of SNPs associated with a gene as a function of gene annotations, in particular gene expression, while accounting for LD by using population-matched genotype correlation information.
(B) From a database of functional information (such as tissue or cell-type RNA-seq), we learn a regression coefficient that captures each annotation’s influence on the variance of GWAS effect sizes. A deviation from the mean gene expression value of ajk results in an increase of to the expected variance of gene-associated GWAS effect sizes. The value represents a regression intercept that estimates the population mean variance. To check learned model parameters, we expect to see an enrichment of LD-informed GWAS gene scores for genes that are specifically expressed in a tissue inferred to be trait relevant. Finally, from a model fit, we can prioritize trait-relevant tissues and genes.