Aneuploidies are copy number variants that affect entire chromosomes. They are seen commonly in cancer, embryonic stem cells, human embryos, and in various trisomic diseases. Aneuploidies frequently affect only a subset of cells in a sample; this is known as “mosaic” aneuploidy. A cell that harbours an aneuploidy exhibits disrupted gene expression patterns which can alter its behaviour. However, detection of aneuploidies using conventional single-cell DNA-sequencing protocols is slow and expensive.
University of Cambridge researchers have developed a method that uses chromosome-wide expression imbalances to identify aneuploidies from single-cell RNA-seq data. The method provides quantitative aneuploidy calls, and is integrated into an R software package available on GitHub and as an Additional file of this manuscript.
The researchers validate their approach using data with known copy number, identifying the vast majority of aneuploidies with a low rate of false discovery. They show further support for the method’s efficacy by exploiting allele-specific gene expression levels, and differential expression analyses.
Successful detection of aneuploidies from scRNA-seq data
a Overview of the method. Cells with aneuploid chromosomes (purple and green) have altered levels of transcription of genes on the affected chromosome (less and more, respectively). For a given chromosome and cell, we compute a score for how deviant the overall expression of genes on that chromosome is relative to that in other cells. b We applied our method to 8-cell stage mouse embryos that were sequenced via a parallel genome and transcriptome method (G&T-seq). Our method performs well compared to the ground truth provided by genomic sequencing (sensitivity 78.0%, specificity 99.5%, FDR 11.4%). The chromosome with high Z-score in embryo F is not called as aneuploid as it does not pass an effect size threshold
The method is quick and easy to apply, straightforward to interpret, and represents a substantial cost saving compared to single-cell genome sequencing techniques. However, the method is less well suited to data where gene expression is highly variable. The results obtained from the method can be used to investigate the consequences of aneuploidy itself, or to exclude aneuploidy-affected expression values from conventional scRNA-seq data analysis.
Availability – A package to run aneuploidy assessment in R alongside a script to download the data used is available at: https://github.com/MarioniLab/Aneuploidy2017. (GZ 6405 kb)