RNA sequencing has emerged as the premier approach to study bacterial transcriptomes. While the earliest published studies analyzed the data qualitatively, the data are readily digitized and lend themselves to quantitative analysis. High-resolution RNA sequence (RNA-seq) data allows transcriptional features (promoters, terminators, operons, among others) to be pinpointed on any bacterial transcriptome. Once the transcriptome is mapped, the activity of transcriptional features can be quantified.
The true power of RNA-seq resides in its potential as an analytical tool for quantifying promoter activity, terminator efficiency, and differential expression of transcripts, including operons, transcription units within operons (e.g. generated by promoters internal to operons), and antisense RNAs. As described in more detail below, RNA-seq datasets consist of tens of millions of sequence reads and typically the reads are 50 bases in length. The raw sequence reads are aligned to a reference genome and only high quality reads are retained and mapped. Conversion of sequence data into digital format is accomplished by employing freely available computer scripts that count the number of times each transcribed base was sequenced in a read-aligned dataset, thereby converting aligned sequence reads to base count data. Normalization of the base count data is necessary to quantify the differential expression (i.e. relative base counts) of each transcriptional feature within a sample or between different samples. The normalized base count data can be quantified by averaging the base count across a selected region of the genome. Since the average of the base counts is used, the relative expression of any given transcription feature, regardless of its length, can be expressed in this way.