The transcriptional state of a cell reflects a variety of biological factors, from cell-type-specific features to transient processes such as the cell cycle, all of which may be of interest. However, identifying such aspects from noisy single-cell RNA-seq data remains challenging.
Researchers at Harvard Medical School have developed pathway and gene set overdispersion analysis (PAGODA) to resolve multiple, potentially overlapping aspects of transcriptional heterogeneity by testing gene sets for coordinated variability among measured cells.
Overview of PAGODA.
Transcriptional heterogeneity is analyzed in seven steps.
(1) Error models are fit for each cell. A model fit for a cell is shown, separating drop-out and amplified components with the 95% confidence envelope (CE) of the amplified component.
(2) The residual expression variance for each gene is determined relative to the transcriptome-wide expectation model (red curve), taking into account the uncertainty in the variance estimate for each gene by determining the effective degrees of freedom (kg) for the χ2 distribution. CV, coefficient of variation.
(3) Weighted PCA is performed on annotated gene sets and on de novo gene sets determined on the basis of correlated expression in the current data set.
(4) Cell PC scores of overdispersed gene sets (those with PC variance significantly higher than expected) are identified as significant aspects of heterogeneity.
(5) Redundant aspects are grouped to provide a succinct overview of heterogeneity.
(6) A web interface is used to navigate the identified aspects of heterogeneity, associated gene sets and gene expression patterns.
(7) Aspects of heterogeneity deemed artifactual or extraneous with respect to the biological question can be controlled for in a subsequent iteration.
Availability – PAGODA is available at http://pklab.med.harvard.edu/scde/