Crosslinking immunoprecipitation sequencing (CLIP-seq) technologies have enabled researchers to characterize transcriptome-wide binding sites of RNA-binding protein (RBP) with high resolution. Tsinghua University researchers apply a soft-clustering method, RBPgroup, to various CLIP-seq datasets to group together RBPs that specifically bind the same RNA sites. Such combinatorial clustering of RBPs helps interpret CLIP-seq data and suggests functional RNA regulatory elements. Furthermore, we validate two RBP-RBP interactions in cell lines. This approach links proteins and RNA motifs known to possess similar biochemical and cellular properties and can, when used in conjunction with additional experimental data, identify high-confidence RBP groups and their associated RNA regulatory elements.
Integrative analytical pipeline for defining high-confidence
RNA sequences/motifs bound by RBP groups
a In total, 84 CLIP-seq (including PAR-CLIP and HITS-CLIP) datasets of 48 human RBPs from HEK293/HEK293T cell lines were collected. b Different computing methods (e.g. Piranha and PARalyzer) were used to call peaks from raw reads for each RBP. Peaks from different methods and biological replicas were overlapped. c Then, the binding sites of all 48 RBPs were merged into one set of binding sites. RNA-sequencing (RNA-seq) data from corresponding cell lines were used to normalize the occupancy of each binding site. d Subsequently, an occupancy profile matrix V (N × M) was generated, representing the binding affinity for each binding site (row) bound by each RBP (column). e The occupancy profile matrix V was decomposed to a basis matrix W (N × R) and a coefficient matrix H (R × M). N denotes the number of binding sites; M denotes the number of RBPs; R denotes the number of groups. f The coefficient matrix was used to define the RBP components and their weights in each group. The basis matrix was used to define group-related binding sites (motifs) and binding affinities.