The use of high-throughput RNA sequencing to predict dynamic operon structures in prokaryotic genomes has recently gained popularity in bioinformatics. Researchers from the Finnish Institute of Occupational Health provide the R implementation of a novel method that uses transcriptomic features extracted from RNA-seq transcriptome profiles to develop ensemble classifiers for condition-dependent operon predictions. The CONDOP package provides a deeper insight into RNA-seq data analysis and allows scientists to highlight the operon organization in the context of transcriptional regulation with a few lines of code.
The function pre.proc() provides the necessary data structures for the main function run.CONDOP(). It develops an ensemble operon pair classifier that combines genomic and transcriptomic features. The ensemble classifier consists of three machine-learning models that are trained on a small set of confirmed operon pairs (OPs) and non-operon pairs (NOPs). The OPs and NOPs are extracted from “confirmed” operons annotated in the DOOR database. The confirmed operons are systematically found by searching for start and end points in transcription grouping consecutive, active coding-sequence and intergenic regions, indicated with CDSs and IGR respectively. The trained ensemble classifier is used to predict the operon status of all gene-pairs including DOOR-based operon pairs, namely DOPs, and putative operon pairs (POPs). Finally, a linkage process is exploited to combine consecutive predicted operon-pairs and, so, build the map of condition-dependent operons namely comap.
Availability – CONDOP is implemented in R and is freely available at CRAN: https://cran.rstudio.com/web/packages/CONDOP/
Contact – [email protected]