Long non-coding RNAs (lncRNAs) can act as scaffolds that promote the interaction of proteins, RNA, and DNA. There is increasing evidence of sequence-specific interactions of lncRNAs with DNA via triple-helix (triplex) formation. This process allows lncRNAs to recruit protein complexes to specific genomic regions and regulate gene expression. Here researchers at RWTH Aachen University propose a computational method called Triplex Domain Finder (TDF) to detect triplexes and characterize DNA-binding domains and DNA targets statistically. Case studies showed that this approach can detect the known domains of lncRNAs Fendrr, HOTAIR and MEG3. Moreover, the researchers validated a novel DNA-binding domain in MEG3 by a genome-wide sequencing method. They used TDF to perform a systematic analysis of the triplex-forming potential of lncRNAs relevant to human cardiac differentiation. They demonstrated that the lncRNA with the highest triplex-forming potential, GATA6-AS, forms triple helices in the promoter of genes relevant to cardiac development. Moreover, down-regulation of GATA6-AS impairs GATA6 expression and cardiac development. These data indicate the unique ability of our computational tool to identify novel triplex-forming lncRNAs and their target genes.
The computational framework of TRIPLEXES and TDF
(A) Triplexes are formed by binding of single-stranded RNA (blue) with a purine-rich strand (green) of a double-stranded DNA via Hoogsteen base pairing. To form a triplex in the parallel orientation, a pyrimidine or mixed motifs are required, but the anti-parallel orientation requires a purine or mixed motifs. (B) For a given RNA and DNA sequence, TRIPLEXES identifies candidate triple helices with a minimum size and maximum number of mismatches following one of the canonical codes. Each triplex is formed by one RNA sequence (triplex forming oligo – TFO) and a DNA region (triple target sites – TTS). We introduce here the concept of DNA binding domains (DBD) based on the fact that TFOs (orange) usually group in particular regions of a RNA. Contiguous regions with overlapping TFOs (marked in red) define a DNA-binding domain. (C) TDF performs statistical tests by combing predictions from TRIPLEXES to answer the following questions: (1) which regions of a RNA (DBD) are more likely to form triple helices with particular DNA target regions? (2) Which DNA regions (target genes) are more likely to be targeted by the RNA? and (3) which lncRNAs are more likely to form triple helices in a set of target DNA regions?