Transposable elements (TEs) regulate diverse biological processes, from early development to cancer. Expression of young TEs is difficult to measure with next-generation, single-cell sequencing technologies because their highly repetitive nature means that short complementary DNA reads cannot be unambiguously mapped to a specific locus. Single CELl LOng-read RNA-sequencing (CELLO-seq) combines long-read single cell RNA-sequencing with computational analyses to measure TE expression at unique loci.
Researchers at the Cancer Research UK Cambridge Institute used CELLO-seq to assess the widespread expression of TEs in two-cell mouse blastomeres as well as in human induced pluripotent stem cells. Across both species, old and young TEs showed evidence of locus-specific expression with simulations demonstrating that only a small number of very young elements in the mouse could not be mapped back to the reference with high confidence. Exploring the relationship between the expression of individual elements and putative regulators revealed large heterogeneity, with TEs within a class showing different patterns of correlation and suggesting distinct regulatory mechanisms.
CELLO-seq overview and its ability to aid in the study of allelic and isoform expression
a, CELLO-seq protocol. Single cells are sorted into plates and lysed. Poly-A mRNA is reverse transcribed with a template-switch oligo. Following exonuclease (ExoI) treatment, splint oligos with 22-RYN UMIs and cellular barcodes are ligated onto first-strand cDNA. After clean-up the libraries are amplified by PCR. b, Datasets generated in this study. c, Transcript coverage over reads with relative (rel.) start and end position for each transcript averaged over hiPSCs (n = 96 cells). d, Boxplots of read identity (y axis) by read coverage (x axis), with error-corrected (left) versus deduplicated (right) data. The y axis is adjusted to cover 80–100% of read identity, and outliers are removed in the boxplot. e, Scatter plot of number of genes per error-corrected UMI sequence for mouse (n = 6) and human (n = 96). f, Boxplot of genes per cell (y axis) with CELLO-seq or Smart-seq2 of hiPSCs (n = 96), separated into genes with or without at least one heterozygous genic SNP. g, Boxplot of allelic ratio (allele 1/(allele 1 + allele 2)) (y axis) by exonic heterozygous (het.) SNPs with CELLO-seq and Smart-seq2 of hiPSCs. h, Boxplot showing number of isoforms per cell (y axis) in mouse blastomeres and hiPSCs. Isoforms were grouped in either all (known) when overlapping an ENSEMBL transcript ID (novel) when not overlapping with an ENSEMBL transcript ID and (TE-derived) when isoforms overlapped repeat and ENSEMBL transcript ID, or if overlapped with repeats not overlapping with genic exons. Human (n = 96 cells), mouse (n = 6 cells). i, Boxplot of mean expression of isoforms (y axis) by classification of h in mouse blastomeres or hiPSCs. Mean expression level was calculated from log counts of all cells. d,f–i, Boxplots show the median, first and third quartiles as a box, and the whiskers indicate the most extreme data points within 1.5 lengths of the box.
Availability – the code is available at: https://github.com/MarioniLab/CELLOseq