Obstacles to detecting isoforms using full-length scRNA-seq data

Early single-cell RNA-seq (scRNA-seq) studies suggested that it was unusual to see more than one isoform being produced from a gene in a single cell, even when multiple isoforms were detected in matched bulk RNA-seq samples. However, these studies generally did not consider the impact of dropouts or isoform quantification errors, potentially confounding the results of these analyses.

In this study, University of Cambridge researchers take a simulation based approach in which they explicitly account for dropouts and isoform quantification errors. They use their simulations to ask to what extent it is possible to study alternative splicing using scRNA-seq. Additionally, they ask what limitations must be overcome to make splicing analysis feasible. The researchers find that the high rate of dropouts associated with scRNA-seq is a major obstacle to studying alternative splicing. In mice and other well-established model organisms, the relatively low rate of isoform quantification errors poses a lesser obstacle to splicing analysis. They find that different models of isoform choice meaningfully change our simulation results.

Schematic of the simulation approach

figure1

Simulation approach applied to a dataset of H1 and H9 human embryonic stem cells (hESCs). In this dataset, each cell’s cDNA was split into two groups and sequenced at two different sequencing depths, enabling us to directly compare our simulation results at different sequencing depths without biological confounders. One group was sequenced at approximately 1 million reads per cell and the other group at approximately 4 million reads per cell on average.  scRNA-seq experiments have been found to saturate in terms of the number of genes detected per cell at approximately 1 million reads per cell. However, we observe differences in the number of isoforms detected per gene per cell at 1 and 4 million reads per cell, indicating that the saturation depth may differ for gene- and isoform-level analyses. Next, we calculate the fraction of overlap between the isoforms expressed in the ground truth and the isoforms detected as expressed in our simulations. We will refer to each gene’s mean fraction of overlap between isoforms expressed in the ground truth and isoforms detected as expressed as the ‘overlap fraction’ hereafter in the text. The mean overlap fraction is consistently higher at 4 million reads per cell compared to at 1 million reads per cell, indicating that our ability to accurately detect isoforms is improved at higher sequencing depths.

To accurately study alternative splicing with single-cell RNA-seq, a better understanding of isoform choice and the errors associated with scRNA-seq is required. An increase in the capture efficiency of scRNA-seq would also be beneficial. Until some or all of the above are achieved, the authors do not recommend attempting to resolve isoforms in individual cells using scRNA-seq.

Westoby J, Artemov P, Hemberg M, Ferguson-Smith A. (2020) Obstacles to detecting isoforms using full-length scRNA-seq data. Genome Biology [Epub ahead of print]. [article]

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Time limit is exhausted. Please reload CAPTCHA.