As part of the process of preparing scRNA-seq libraries, a diverse template is typically amplified by PCR. During amplification, spurious chimeric molecules can be formed between molecules originating in different cells. While several computational and experimental strategies have been suggested to mitigate the impact of chimeric molecules, they have not been addressed in the context of scRNA-seq experiments.
Broad Institute researchers demonstrate that chimeras become increasingly problematic as samples are sequenced deeply and propose two computational solutions. The first is unsupervised and relies only on cell barcode and UMI information. The second is a supervised approach built on labeled data and a set of molecule specific features. The classifier can accurately identify most of the contaminating molecules in a deeply sequenced species mixing scRNA-seq dataset.
A strategy to correct for chimeric molecules
(A) Current scRNA-seq quantification using UMIs collapses molecules into unit counts regardless of overall read abundance. An alternative approach is suggested in which molecules are first grouped by cell barcode-UMI pairs and then molecules that have a low relative abundance are filtered before collapsing (B). (C) Scatter plot comparing signed log10(q-values) for differential expression between human and mouse cells before and after removing molecules with TPT<0.02. Red line is smoothed lowess fit.
Availability – Code is publicly available at https://github.com/asncd/schimera.