Currently, several hundreds of Terabytes of COVID-19 single-cell RNA-seq (scRNA-seq) data are available in public repositories. This data refers to multiple tissues, comorbidities, and conditions. This trend is expected to continue, and it is realistic to predict amounts of COVID-19 scRNA-seq data increasing to several Petabytes in the coming years. However, thoughtful analysis of this data requires large-scale computing infrastructures, and software systems optimized for such platforms to generate biological knowledge. Researchers at the Oswaldo Cruz Foundation have developed CellHeap, a portable and robust workflow for scRNA-seq customizable analyses, with quality control throughout the execution steps and deployable on supercomputers. The researchers present the deployment of CellHeap in the Santos Dumont supercomputer for analyzing COVID-19 scRNA-seq datasets, and discuss a case study that processed dozens of Terabytes of COVID-19 scRNA-seq raw data.
CellHeap – a workflow for optimizing COVID-19 single-cell RNA-Seq data processing
Silva VS, Costa MOC, Castro MCS et al. (2021) CellHeap: A Workflow for Optimizing COVID-19 Single-Cell RNA-Seq Data Processing in the Santos Dumont Supercomputer. Adv Bioinform and Comput Biol, 41–52. [abstract]