Single-cell RNA-sequencing (scRNA-seq) technology provides an excellent platform for measuring the expression profiles of genes in heterogeneous cell populations. Multiple tools for the analysis of scRNA-seq data have been developed over the years. The tools require complicated commands and steps to analyze the underlying data, which are not easy to follow by genome researchers and experimental biologists. University of Louisville researchers describe a step-by-step workflow for processing and analyzing the scRNA-seq unique molecular identifier (UMI) data from Human Lung Adenocarcinoma cell lines. They demonstrate the basic analyses including quality check, mapping and quantification of transcript abundance through suitable real data example to obtain UMI count data. The researchers performed basic statistical analyses, such as zero-inflation, differential expression and clustering analyses on the obtained count data. They studied the effects of excess zero-inflation present in scRNA-seq data on the downstream analyses. These findings indicate that the zero-inflation associated with UMI data had no or minimal role in clustering, while it had significant effect on identifying differentially expressed genes. The researchers also provide an insight into the comparative analysis for differential expression analysis tools based on zero-inflated negative binomial and negative binomial models on scRNA-seq data. The sensitivity analysis enhanced their findings in that the negative binomial model-based tool did not provide an accurate and efficient way to analyze the scRNA-seq data. This study provides a set of guidelines for the users to handle and analyze real scRNA-seq data more easily.
Outlines of the workflow for various steps in scRNA-seq data analysis
(A) Key steps involved in a typical single-cell RNA-seq experiment starting from the sample preparation by the isolation and lysis of single cells up to the data analysis. (B) Data preprocessing steps beginning from the .fastq files up to the generation of count matrix and the tools required at each stage. (C) Significant data analysis steps with the input count data matrix undertaken in the scRNA-seq study.