Single-cell RNA sequencing (scRNA-seq) has enabled the unbiased, high-throughput quantification of gene expression specific to cell types and states. With the cost of scRNA-seq decreasing and techniques for sample multiplexing improving, population-scale scRNA-seq, and thus single-cell expression quantitative trait locus (sc-eQTL) mapping, is increasingly feasible. Mapping of sc-eQTL provides additional resolution to study the regulatory role of common genetic variants on gene expression across a plethora of cell types and states and promises to improve our understanding of genetic regulation across tissues in both health and disease.
While previously established methods for bulk eQTL mapping can, in principle, be applied to sc-eQTL mapping, there are a number of open questions about how best to process scRNA-seq data and adapt bulk methods to optimize sc-eQTL mapping. A team led by researchers at the European Bioinformatics Institute evaluated the role of different normalization and aggregation strategies, covariate adjustment techniques, and multiple testing correction methods to establish best practice guidelines. the team used both real and simulated datasets across single-cell technologies to systematically assess the impact of these different statistical approaches.
Overview of normalization, aggregation and the single-cell eQTL mapping considered
a RNA pre-processing steps to obtain count matrices to perform eQTL mapping, including gene expression quantification, cell and gene-level quality control (QC), and cell type annotation. These steps are not optimized/tested in this work (shown in gray). b Different approaches tested to perform eQTL mapping using scRNA-seq profiles. Starting from one gene x cell count matrix obtained as in a, counts were aggregated per sample (i.e., donor, or donor-run combination), either by summing the data first at the sample level and then normalizing using methods designed for bulk RNA-seq (i.e., TMM) as implemented in edgeR or by first normalizing the single-cell counts (using scran/scater) and then calculating the mean or the median at the sample level. c eQTL mapping (cis). We map eQTL independently for each gene-SNP pair considered by fitting a linear mixed model. In particular, we model gene expression as the outcome variable (y), the SNP effect as well as additional covariates as fixed effects, and include one (or more) random effect (RE) term to account for population structure and sample variation. We considered various methods to compute covariates and tested different numbers of covariates as well. d Multiple testing correction is performed in two steps. First, gene-level p values are adjusted using a permutation scheme to control the FWER across SNPs. Second, the top SNP per gene is selected and various methods are used to control the FDR and obtain globally corrected p values. Steps that we optimize here are highlighted in blue in panels b, c, and d
Availability – The eQTL mapping pipeline is available via: https://github.com/single-cell-genetics/limix_qtl