Translation has a fundamental function in defining the fate of the transcribed genome. RNA-sequencing (RNA-seq) data enable the quantification of complex transcript mixtures, often detecting several transcript isoforms of unknown functions for one gene. Researchers from the Max Delbrück Center for Molecular Medicine describe ORFquant, a method to annotate and quantify translation at the level of single open reading frames (ORFs), using information from Ribo-seq data. By developing an approach for transcript filtering, they quantify translation transcriptome-wide, revealing translated ORFs on multiple isoforms per gene. For most genes, one ORF represents the dominant translation product, but we also detect genes with translated ORFs on multiple transcript isoforms, including targets of RNA surveillance mechanisms. Measuring translation across human cell lines reveals the extent of gene-specific differences in protein production, supported by steady-state protein abundance estimates. Computational analysis of Ribo-seq data with ORFquant provides insights into the heterogeneous functions of complex transcriptomes.
The ORFquant strategy to quantify translation on selected transcripts
a, The ORFquant workflow. b, The PHLEKM2 gene as an example. ORF coverage is defined here using the percentage of gene translation. Fill colors for discarded and selected transcripts indicate unique features with no signal (black); shared features with no signal (gray); unique features with signal (red); and shared features with signal (pink). Fill colors for selected ORFs indicate coverage in shared features (blue heat map) and unique features (red heat map); color intensities indicate coverage signal (darker colors indicate higher coverage). For quantified ORFs, the heat map indicates ORF coverage values (0–100). Thick blocks indicate CDS regions, as defined by the annotation or by ORFquant (de novo). c, Number of selected transcripts per gene (x axis) is plotted against number of genes (y axis). d, Percentage of covered junctions (bottom), or covered exons (top) mapping to a different number of transcript structures using all annotated transcripts, protein-coding transcripts only, or selected transcripts only. e, The number of quantified ORFs (x axis) is plotted against number of genes (y axis). f, The number of genes (y axis) is plotted against the contribution (in percentages) of their major ORF. g, Aggregate plot of Ribo-seq coverage (normalized 0–1 per region) and ORF coverage (ORF_pct_P_sites_pN) over candidate alternative splice site regions as detected by ORFquant. No mixture indicates one ORF only, and other tracks indicate the presence of additional ORFs, divided by their summed translation values. Explanatory scheme at the bottom, with blue representing the major ORF and red representing the additional ORF(s). Source data for c–f are available online.
Availability – https://github.com/lcalviell/ORFquant