Quantification of isoform abundance has been extensively studied at the mature-RNA level using RNA-seq but not at the level of precursor RNAs using nascent RNA sequencing.
Researchers from Cold Spring Harbor Laboratory address this problem with a new computational method called Deconvolution of Expression for Nascent RNA sequencing data (DENR), which models nascent RNA sequencing read counts as a mixture of user-provided isoforms. The baseline algorithm is enhanced by machine-learning predictions of active transcription start sites and an adjustment for the typical “shape profile” of read counts along a transcription unit. The researchers show that DENR outperforms simple read-count-based methods for estimating gene and isoform abundances, and that transcription of multiple pre-RNA isoforms per gene is widespread, with frequent differences between cell types. In addition, they provide evidence that a majority of human isoform diversity derives from primary transcription rather than from post-transcriptional processes.
(Top) DENR first groups the available isoform annotations into nonoverlapping, stand-specific clusters and summarizes the associated read counts in genomic bins of user-specified size (default 250 bp). At this stage, it optionally masks bins corresponding to the start and end of each isoform. It then collapses mature RNA isoforms together that share start (TSS) and end (PAS) coordinates within the resolution of a single bin. (Middle) The program then optionally adjusts the isoform model to reflect a typical “U”-shaped profile, and optionally applies a machine-learning method to predict active TSSs based on patterns of bidirectional transcription. At this stage, it may also exclude isoforms designated by the user as inactive (not shown). (Bottom) Finally, DENR estimates the abundance of each isoform in each cluster by minimizing the squared difference between the expected and observed read counts across all bins.