Massively parallel RNA sequencing (RNA-seq) has rapidly become the assay of choice for interrogating RNA transcript abundance and diversity. This article provides a detailed introduction to fundamental RNA-seq molecular biology and informatics concepts. The authors make available open-access RNA-seq tutorials that cover cloud computing, tool installation, relevant file formats, reference genomes, transcriptome annotations, quality-control strategies, expression, differential expression, and alternative splicing analysis methods.
A typical RNA-seq experimental workflow involves the isolation of RNA from samples of interest, generation of sequencing libraries, use of a high-throughput sequencer to produce hundreds of millions of short paired-end reads, alignment of reads against a reference genome or transcriptome, and downstream analysis for expression estimation, differential expression, transcript isoform discovery, and other applications.
An example RNA-seq analysis workflow is depicted for a typical gene expression and differential expression analysis. Such workflows have several common themes across different tool sets and RNA-seq analysis goals. RNA-seq analysis typically relies on inputs such as reference genome sequences, gene annotations, and raw sequence data. Working with these inputs requires familiarity with several standardized file formats such as FASTA (.fa), FASTQ, and gene transfer format (GTF). Typical RNA-seq analysis workflows start with raw data quality control (QC), then perform read trimming, alignment or assembly of reads, apply customized algorithms for a particular analysis goal (e.g., Cufflinks and Cuffdiff for gene expression analysis), and end with summarization and visualization of the results. For each step, alternative and representative tools and strategies are shown. There are many others. Each of the workflow steps depicted here and additional analysis vignettes are implemented in the Supplementary Tutorials accompanying this work and available online at www.rnaseq.wiki.
Availability – These tutorials and additional training resources are accompanied by complete analysis pipelines and test datasets made available without encumbrance at www.rnaseq.wiki.