In single-cell RNA-sequencing (scRNA-seq) data, stratification of sequencing reads by cellular barcode is necessary to study cell specific features. However, apart from gene expression, the analyses of cell-specific features are not supported by available tools that are designed for bulk RNA-Seq data. Researchers from George Washington University have developed a tool, SCExecute, which executes a user-provided command on barcode-stratified, extracted on-the-fly, single cell binary alignment map (scBAM) files. SCExecute extracts the cell barcode from aligned, pooled single-cell sequencing data. The user-specified command option executes all the commands defined in the session from monolithic programs and multi-command shell-scripts to complex shell-based pipelines. The execution can be further restricted to barcodes or/and genomic regions of interest. The researchers demonstrate SCExecute with two popular variant callers, GATK and Strelka2, combined with modules for bam file manipulation and variant filtering, to detect single cell-specific expressed Single Nucleotide Variants (sceSNVs) from droplet scRNA-seq data (10X Genomics Chromium System). In conclusion, SCExecute facilitates custom cell-level analyses on barcoded scRNA-seq data using currently available tools and provides an effective solution for studying low (cellular) frequency transcriptome features.
a. ScExecute data processing examples. b. UMAP projections showing cells classified by type (top) and visualizing the cell distribution and the cellular expressed variant allele frequency (VAFRNA) of the missense substitution rs4603 (1:151401549_T>C) in the gene PSMB4 across the samples from the neuroblastoma dataset. The red color intensity shows the relative expression of the sceSNV in cells with at least 5 sequencing reads covering the SNV locus, and the green color indicates that all the reads covering the SNV locus carried the reference nucleotide (See Supplementary Methods). Cells in which the SNV locus is covered by less than 5 reads are shown in grey. The rs4603 VAFRNA cell distribution is consistent with germline homozygous variant in sample SAMN12799266, heterozygous variant in samples SAMN12799264 and SAMN12799263, and absence in sample SAMN12799269. c. UMAP showing cells classified by type and the cell distribution and VAFRNA of the missense substitution rs 1051447 (also reported as a somatic mutation, COSV56936745) in the gene CWH43 in the two prostate cancer samples (top). The CWH43 and COSV56936745 are mostly expressed in neurons. The IGV visualization of SNV positive cells shows mono-and bi-allelic expression of the SNV (bottom).
Availability: SCExecute is implemented in Python3 using the PySAM package and distributed for Linux and Python environments from https://github.com/HorvathLab/NGS/tree/master/SCExecute.