RNA-seq is being used increasingly for gene expression studies and it is revolutionizing the fields of genomics and transcriptomics. However, the field of RNA-seq analysis is still evolving. Now, researchers at the University of Tennessee, Knoxville specifically designed a study to contain large numbers of reads and four biological replicates per condition so they could alter these parameters and assess their impact on differential expression results.
Bacillus thuringiensis strains ATCC10792 and CT43 were grown in two Luria broth medium lots on four dates and transcriptomics data were generated using one lane of sequence output from an Illumina HiSeq2000 instrument for each of the 32 samples, which were then analyzed using DESeq2. Genome coverages across samples ranged from 87 to 465X with medium lots and culture dates identified as major variation sources. Significantly differentially expressed genes (5% FDR, two-fold change) were detected for cultures grown using different medium lots and between different dates. The highly differentially expressed iron acquisition and metabolism genes, were a likely consequence of differing amounts of iron in the two media lots. Indeed, in this study RNA-seq was a tool for predictive biology since the researchers hypothesized and confirmed the two LB medium lots had different iron contents (~two-fold difference). The researchers demonstrate that the noise in data can be controlled and minimized with appropriate experimental design and by having the appropriate number of replicates and reads for the system being studied.
Variation analysis of raw read count data for strain ATCC10792 and strain CT43
(A) Principal Component Analysis (PCA) for ATCC10792 using a Pearson correlation coefficient and colored by media, (B) Hierarchical cluster analysis of the same data for strain ATCC10792, (C) PCA for CT43 and (D) CT43 cluster analysis.