Marina – for identifying over-represented plant transcription factor binding sites

From initial seed germination through reproduction, plants continuously reprogram their transcriptional repertoire to facilitate growth and development. This dynamic is mediated by a diverse but inextricably-linked catalog of regulatory proteins called transcription factors (TFs). Statistically quantifying TF binding site (TFBS) abundance in promoters of differentially expressed genes can be used to identify binding site patterns in promoters that are closely related to stress-response. Output from today’s transcriptomic assays necessitates statistically-oriented software to handle large promoter-sequence sets in a computationally tractable fashion.



Researchers at NIH have developed Marina, an open-source software for identifying over-represented TFBSs from amongst large sets of promoter sequences, using an ensemble of 7 statistical metrics and binding-site profiles. Through software comparison, they show that Marina can identify considerably more over-represented plant TFBSs compared to a popular software alternative.

Marina was used to identify over-represented TFBSs in a two time-point RNA-Seq study exploring the transcriptomic interplay between soybean (Glycine max) and soybean rust (Phakopsora pachyrhizi). Marina identified numerous abundant TFBSs recognized by transcription factors that are associated with defense-response such as WRKY, HY5 and MYB2. Comparing results from Marina to that of a popular software alternative suggests that regardless of the number of promoter-sequences, Marina is able to identify significantly more over-represented TFBSs.


  • Hosseini P, Ovcharenko I, Matthews BF. (2013) Using an ensemble of statistical metrics to quantify large sets of plant transcription factor binding sites. Plant Methods 9(1), 12. [Epub ahead of print]. [article]