From towards data science by Shivangi Patel
The goals of a single cell RNA sequencing (scRNA-seq) project are often Identification of subpopulations and Differential Gene Expression Analysis. To avoid the ‘curse of dimensionality’, Highly Variable Genes (HVGs) are used for cluster analysis. Several studies have shown that selection of HVGs is sensitive to the choice of method used for normalization of raw count matrices.
Raw read counts cannot be directly used to compare gene expression between cells, as they are confounded by technical and ‘uninteresting’ biological variations. There are QC steps and other methods available to filter and regress uninteresting biological variations. While PCR amplification bias is often taken care by use of Unique Molecular Identifiers (UMIs), normalization is required to remove effects of other technical variations like differences in sequencing depth, cell lysis and reverse transcription efficiency.