Allele-specific expression (ASE) analysis, which quantifies the relative expression of two alleles in a diploid individual, is a powerful tool for identifying cis-regulated gene expression variations that underlie phenotypic differences among individuals. Existing methods for gene-level ASE detection analyze one individual at a time, therefore failing to account for shared information across individuals. Failure to accommodate such shared information not only reduces power, but also makes it difficult to interpret results across individuals. However, when only RNA sequencing (RNA-seq) data are available, ASE detection across individuals is challenging because the data often include individuals that are either heterozygous or homozygous for the unobserved cis-regulatory SNP, leading to sample heterogeneity as only those heterozygous individuals are informative for ASE, whereas those homozygous individuals have balanced expression.
To simultaneously model multi-individual information and account for such heterogeneity, researchers from the University of Pennsylvania Perelman School of Medicine developed ASEP, a mixture model with subject-specific random effect to account for multi-SNP correlations within the same gene. ASEP only requires RNA-seq data, and is able to detect gene-level ASE under one condition and differential ASE between two conditions (e.g., pre- versus post-treatment). Extensive simulations demonstrated the convincing performance of ASEP under a wide range of scenarios. The researchers applied ASEP to a human kidney RNA-seq dataset, identified ASE genes and validated our results with two published eQTL studies. They further applied ASEP to a human macrophage RNA-seq dataset, identified genes showing evidence of differential ASE between M0 and M1 macrophages, and confirmed their findings by results from cardiometabolic trait-relevant genome-wide association studies. To the best of the authors knowledge, ASEP is the first method for gene-level ASE detection at the population level that only requires the use of RNA-seq data. With the growing adoption of RNA-seq, we believe ASEP will be well-suited for various ASE studies for human diseases.
Challenges in cross-individual gene-based ASE analysis
Heterogeneity of the ASE effect exists across individuals in a population. Because the cis-regulatory SNP (rSNP) is often unobserved, the bulk RNA-seq data include individuals (ID) that are either heterozygous or homozygous at the rSNP. The mRNA expression levels differ between two haplotypes only in those heterozygous individuals. Additionally, a gene may have multiple heterozygous transcribed SNPs (tSNPs). To differentiate paternal and maternal alleles, haplotype phase information is needed, which is often not available in most studies. Further complicating the analysis, to aggregate ASE effects across individuals, haplotypes that reside on the same allele of the unobserved rSNP need to be aligned across individuals.
Availability – ASEP is implemented as an R package and is freely available on Github (https://github.com/Jiaxin-Fan/ASEP), with detailed tutorial and examples provided.