Many proteins regulate the expression of genes by binding to specific regions encoded in the genome. Here researchers from the University of California San Diego introduce a new data set of RNA elements in the human genome that are recognized by RNA-binding proteins (RBPs), generated as part of the Encyclopedia of DNA Elements (ENCODE) project phase III. This class of regulatory elements functions only when transcribed into RNA, as they serve as the binding sites for RBPs that control post-transcriptional processes such as splicing, cleavage and polyadenylation, and the editing, localization, stability and translation of mRNAs.
The researchers describe the mapping and characterization of RNA elements recognized by a large collection of human RBPs in K562 and HepG2 cells. Integrative analyses using five assays identify RBP binding sites on RNA and chromatin in vivo, the in vitro binding preferences of RBPs, the function of RBP binding sites and the subcellular localization of RBPs, producing 1,223 replicated data sets for 356 RBPs. They describe the spectrum of RBP binding throughout the transcriptome and the connections between these interactions and various aspects of RNA biology, including RNA stability, splicing regulation and RNA localization. These data expand the catalogue of functional elements encoded in the human genome by the addition of a large set of elements that function at the RNA level by interacting with RBPs.
Overview of experiments and data types
a, The five assays performed to characterize RBPs. b, Three hundred and fifty-six RBPs profiled by at least one ENCODE experiment (orange or red) with localization by immunofluorescence (green), essential genes from CRISPR screening (maroon), manually annotated RBP functions (blue or purple), and annotated protein domains (pink; RRM, KH, zinc finger, RNA helicase, RNase, double-stranded RNA binding (dsRBD), and pumilio/FBF domain (PUM-HD)). Histograms for each category are shown at bottom. c, Combinatorial expression and splicing regulation of PTBP3. Tracks indicate eCLIP and RNA-seq read density (reads per million). Tracks are shown for replicate 1; eCLIP and KD–RNA-seq were performed in biological duplicate with similar results. Bottom, alternatively spliced exon 2, with lines indicating junction-spanning reads and indicated per cent spliced in (ψ). Boxes indicate reproducible (by IDR) PTBP1 peaks, with red boxes indicating RBNS motifs for the PTB family member PTBP3 located within (or up to 50 bases upstream of) peaks.