Although messenger RNAs are key molecules for understanding life, until now, no method has existed to determine the full-length sequence of endogenous mRNAs including their poly(A) tails. Moreover, although non-A nucleotides can be incorporated in poly(A) tails, there also exists no method to accurately sequence them. Researchers from the Max Delbrück Center for Molecular Medicine have developed full-length poly(A) and mRNA sequencing (FLAM-seq), a rapid and simple method for high-quality sequencing of entire mRNAs. The researchers report a complementary DNA library preparation method coupled to single-molecule sequencing to perform FLAM-seq. Using human cell lines, brain organoids and Caenorhabditis elegans they show that FLAM-seq delivers high-quality full-length mRNA sequences for thousands of different genes per sample. The researchers find that 3′ untranslated region length is correlated with poly(A) tail length, that alternative polyadenylation sites and alternative promoters for the same gene are linked to different tail lengths, and that tails contain a substantial number of cytosines.
Full-length poly(A) mRNA sequencing (FLAM-seq)
a, Outline of FLAM-seq. b, Genome-browser plot of aligned reads to representative gene BTF3. Poly(A) tail length is added to the 3′ end of the alignments in blue. Ensemble transcript models are shown in yellow. c, Histogram of number of reads per gene for all sequenced samples (n = 1, two merged replicates). d, Histogram of mapped read length for all samples (two merged replicates, n = 1). e, Gene expression (reads per gene) correlation matrix between human and C. elegans samples (n = 2, two replicates (rep. 1 and 2) each). Color scale corresponds to Pearson’s correlation coefficient. f, Boxplots of fraction of 5′ ends of read alignments overlapping with FANTOM5 TSSs (n = 1, two merged replicates, human) and g, C. elegans SAGE TSS annotation data per gene (n = 1, two merged replicates). Boxplot definition: box bottoms/tops are lower/upper quartiles, bar is the median and whiskers are median ± 1.5× interquartile range. h, Heatmaps showing relative coverage across each gene in indicated samples (n = 1, merged replicates). Genes are ranked by length. Low coverage in dark blue, high coverage in yellow. nt, nucleotide.
Code Availability – The software used for data analysis is available at https://github.com/rajewsky-lab/FLAMAnalysis.