ChloroSeq – an organelle RNA-Seq bioinformatics pipeline

Online sequence repositories are teeming with RNA sequencing (RNA-Seq) data from a wide range of eukaryotes. Although most of these data sets contain large numbers of organelle-derived reads, researchers tend to ignore these data, focusing instead on the nuclear-derived transcripts. Consequently, GenBank contains massive amounts of organelle RNA-Seq data that are just waiting to be downloaded and analyzed.

Recently, a team of scientists from the University of Western Ontario designed an open-source bioinformatics program called ChloroSeq, which systemically analyzes an organelle transcriptome using RNA-Seq. The ChloroSeq pipeline uses RNA-Seq alignment data to deliver detailed analyses of organelle transcriptomes, which can be fed into statistical software for further analysis and for generating graphical representations of the data. In addition to providing data on expression levels via coverage statistics, ChloroSeq can examine splicing efficiency and RNA editing profiles. Ultimately, ChloroSeq provides a well-needed avenue for researchers of all stripes to start exploring organelle transcription and could be a key step toward a more thorough understanding of organelle gene expression.

Available data in GenBank for exploring organelle transcription in plastid-bearing eukaryotes

rna-seq

(A) As of 17 June 2016, GenBank’s SRA [http://www.ncbi.nlm.nih.gov/sra] contained 42950 publicly available RNA-Seq data sets from plastid-bearing species, 91% of which came from land plants. (B) Similarly, the most recent RefSeq release of mitochondrial and plastid organelle genome sequences (accessed 17 June 2016) [http://www.ncbi.nlm.nih.gov/genome/organelle/] included 1481 organelle genomes from land plants and algae, 1203 and 278 of which were ptDNAs and mtDNAs, respectively. This is an underestimate of the total number of available organelle genome sequences in GenBank because the RefSeq database often does not include genomes from different strains of the same species or nearly complete organelle DNAs. (C) These freely accessible RNA-Seq and organelle genome data can be used with the bioinformatics program ChloroSeq [6] to systematically analyze organelle transcriptomes.

Availability – ChloroSeq is open-source and freely available from GitHub: https://github.com/BenoitCastandet/chloroseq

Smith DR, Sanitá Lima M. (2016) Unraveling chloroplast transcriptomes with ChloroSeq, an organelle RNA-Seq bioinformatics pipeline. Brief Bioinform [Epub ahead of print]. [article]

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Time limit is exhausted. Please reload CAPTCHA.