The development of High Throughput Sequencing (HTS) for RNA profiling (RNA-seq) has shed light on the diversity of transcriptomes. While RNA-seq is becoming a de facto standard for monitoring the population of expressed transcripts in a given condition at a specific time, processing the huge amount of data it generates requires dedicated bioinformatics programs. Here, researchers from CNRS UMR6290 describe a standard bioinformatics protocol using state-of-the-art tools, the STAR mapper to align reads onto a reference genome, Cufflinks to reconstruct the transcriptome, and RSEM to quantify expression levels of genes and transcripts. They present the workflow using human transcriptome sequencing data from two biological replicates of the K562 cell line produced as part of the ENCODE3 project.
Schematic overview of the bioinformatics pipeline described in this protocol
Input files are in green, intermediary files in gray, output files in yellow, and the main steps (bioinformatics tools) in red. Using reference genome and annotation files, RNA-seq reads are mapped using STAR to the genome and to the transcriptome. The genome alignment output file is then used by Cufflinks to reconstruct known and novel transcripts. The transcriptome alignment output file is used by RSEM to quantify the levels of expression of genes and transcripts. The index construction required by STAR and RSEM is implicitly represented.