The recent advances in high throughput RNA sequencing (RNA-Seq) have generated huge amounts of data in a very short span of time for a single sample. These data have required the parallel advancement of computing tools to organize and interpret them meaningfully in terms of biological implications, at the same time using minimum computing resources to reduce computation costs.
Described here is the method of analyzing RNA-seq data using the set of open source software programs of the Tuxedo suite: TopHat and Cufflinks. TopHat is designed to align RNA-seq reads to a reference genome, while Cufflinks assembles these mapped reads into possible transcripts and then generates a final transcriptome assembly. Cufflinks also includes Cuffdiff, which accepts the reads assembled from two or more biological conditions and analyzes their differential expression of genes and transcripts, thus aiding in the investigation of their transcriptional and post transcriptional regulation under different conditions. Also described is the use of an accessory tool called CummeRbund, which processes the output files of Cuffdiff and gives an output of publication quality plots and figures of the user’s choice. Finally, the effectiveness of the Tuxedo suite is demonstrated by analyzing RNA-Seq datasets of Arabidopsis thaliana root subjected to two different conditions.
A scatter plot showing the expression of genes under the two experimental conditions, with the x-axis representing the gene expression values for the control condition and the y-axis representing the gene expression values for the treated condition. Each point thus represents the expression of a gene under both conditions. From the plot, it is evident that that gene expression of some of the genes is increased in the treated condition.