Recent single-cell RNA-seq protocols based on droplet microfluidics use massively multiplexed barcoding to enable simultaneous measurements of transcriptomes for thousands of individual cells. The increasing complexity of such data creates challenges for subsequent computational processing and troubleshooting of these experiments, with few software options currently available. Here, Harvard Medical School researchers describe a flexible pipeline for processing droplet-based transcriptome data that implements barcode corrections, classification of cell quality, and diagnostic information about the droplet libraries. They introduce advanced methods for correcting composition bias and sequencing errors affecting cellular and molecular barcodes to provide more accurate estimates of molecular counts in individual cells.
a The number of molecules mapping to human and mouse genomes in a human–mouse Drop-seq dataset (dataset 12) is shown for each cell (points) on a log scale. The plot shows annotations of high-confidence cells for each organism, doublets, and background barcodes. b The number of equidistant adjacent CBs of larger size (i.e., number of molecules) is shown for each of the observed CBs in the mouse embryonic stem cell dataset (dataset 1). The main plot shows adjacent CBs selected from an a priori known set of valid CB sequences. The inset shows counts of adjacent CBs selected from all CB sequences observed in the dataset. c To illustrate the effect of CB corrections, the plot shows the increase in number of molecules per CB (x-axis) following a CB merge correction procedure, relative to the original size. The 10× 8k PBMC (dataset 13), Drop-seq human–mouse mixture (dataset 12), and inDrop BMC (dataset 11) datasets are shown
Availability – https://github.com/hms-dbmi/dropEst