Normalizing single cell RNA sequencing data — Pitfalls and Recommendations

From towards data science by Shivangi Patel

The goals of a single cell RNA sequencing (scRNA-seq) project are often Identification of subpopulations and Differential Gene Expression Analysis. To avoid the ‘curse of dimensionality’, Highly Variable Genes (HVGs) are used for cluster analysis. Several studies have shown that selection of HVGs is sensitive to the choice of method used for normalization of raw count matrices.

Why Normalization?

Raw read counts cannot be directly used to compare gene expression between cells, as they are confounded by technical and ‘uninteresting’ biological variations. There are QC steps and other methods available to filter and regress uninteresting biological variations. While PCR amplification bias is often taken care by use of Unique Molecular Identifiers (UMIs), normalization is required to remove effects of other technical variations like differences in sequencing depth, cell lysis and reverse transcription efficiency.

(read more at towards data science…)

Leave a Reply

Your email address will not be published. Required fields are marked *


Time limit is exhausted. Please reload CAPTCHA.