Single-cell ribonucleic acid (RNA)-sequencing (scRNA-seq) is a powerful tool to study cellular heterogeneity. The high dimensional data generated from this technology are complex and require specialized expertise for analysis and interpretation. The core of scRNA-seq data analysis contains several key analytical steps, which include pre-processing, quality control, normalization, dimensionality reduction, integration and clustering. Each step often has many algorithms developed with varied underlying assumptions and implications. With such a diverse choice of tools available, benchmarking analyses have compared their performances and demonstrated that tools operate differentially according to the data types and complexity.
Researchers at Queen Mary University of London have developed Integrated Benchmarking scRNA-seq Analytical Pipeline (IBRAP), which contains a suite of analytical components that can be interchanged throughout the pipeline alongside multiple benchmarking metrics that enable users to compare results and determine the optimal pipeline combinations for their data. The researchers apply IBRAP to single- and multi-sample integration analysis using primary pancreatic tissue, cancer cell line and simulated data accompanied with ground truth cell labels, demonstrating the interchangeable and benchmarking functionality of IBRAP. Their results confirm that the optimal pipelines are dependent on individual samples and studies, further supporting the rationale and necessity of our tool. They then compare reference-based cell annotation with unsupervised analysis, both included in IBRAP, and demonstrate the superiority of the reference-based method in identifying robust major and minor cell types. Thus, IBRAP presents a valuable tool to integrate multiple samples and studies to create reference maps of normal and diseased tissues, facilitating novel biological discovery using the vast volume of scRNA-seq data available.
A workflow for scRNA-seq analyses in IBRAP. IBRAP accepts droplet- and non-droplet-based scRNA-seq counts
If the user has processed their cells with droplet-based infrastructure, they may use our droplet-based cleaning packages that we included (DecontX and Scrublet). Otherwise, the user may continue to data transformation encompassed with normalization, highly variable gene selection and, when required, sample integration. After this has been performed, a user then proceeds to inference, where a user will identify cell clusters, identify developmental trajectories, label their cell types using a reference-based package singleR, or a canonical marker-based cell annotation package scType, or infer cell–cell communication. The user can then use our Rshiny application to investigate their results in an easier fashion than using the terminal. Finally, if the user opts for finding cell clusters from the unsupervised clustering, they must uncover the biology driving the clusters to identify their cell types. For this, the user can produce a range of different gene expression plots, differential expression and a Gene Set Enrichment Analysis using ssGSEA.
Availability – IBRAPs code is publicly available and can be found on GitHub alongside any appropriate tutorials – https://github.com/connorhknight/IBRAP.