The genome-wide identification of microRNA transcription start sites (miRNA TSSs) is essential for understanding how miRNAs are regulated in development and disease. In this study, researchers from Vanderbilt University School of Medicine developed mirSTP (mirna transcription Start sites Tracking Program), a probabilistic model for identifying active miRNA TSSs from nascent transcriptomes generated by global run-on sequencing (GRO-seq) and precision run-on sequencing (PRO-seq). MirSTP takes advantage of characteristic bidirectional transcription signatures at active TSSs in GRO/PRO-seq data, and provides accurate TSS prediction for human intergenic miRNAs at a high resolution. MirSTP performed better than existing generalized and experiment specific methods, in terms of the enrichment of various promoter-associated marks. MirSTP analysis of 27 human cell lines in 183 GRO-seq and 28 PRO-seq experiments identified TSSs for 480 intergenic miRNAs, indicating a wide usage of alternative TSSs. By integrating predicted miRNA TSSs with matched ENCODE transcription factor (TF) ChIP-seq data, the researchers connected miRNAs into the transcriptional circuitry, which provides a valuable source for understanding the complex interplay between TF and miRNA. With mirSTP, we not only predicted TSSs for 72 miRNAs, but also identified 12 primary miRNAs with significant RNA polymerase pausing alterations after JQ1 treatment; each miRNA was further validated through BRD4 binding to its predicted promoter.
Identification of TSSs at a high resolution
Prediction distance from known gene TSSs annotated by RefSeq, UCSC or Ensembl databases at relax, medium or stringent cutoffs in K562 PRO-seq data (A) and K562 GRO-seq data (B). Genomic characteristics of the region around the predicted ARHGAP6 TSS by mirSTP in the K562 cell line, including PRO-seq, GRO-seq, GRO-cap and RNA-seq. The predicted TSS was supported by GRO-cap and RNA-seq data. Although the TSS was not annotated by Refseq, it was included in both UCSC and Ensembl (C). MirSTP performance on variable K562 PRO-seq depth (D). Y-axis is the prediction distance from known gene TSSs annotated by Ensembl at the medium cutoff, while x-axis is the random subsampled PRO-seq size.
Availability – MirSTP is available at http://bioinfo.vanderbilt.edu/mirSTP/