While RNA-sequencing (RNA-seq) is becoming a powerful technology in transcriptome profiling, one significant shortcoming of the first-generation RNA-seq protocol is that it does not retain the strand specificity of origin for each transcript. Without strand information it is difficult and sometimes impossible to accurately quantify gene expression levels for genes with overlapping genomic loci that are transcribed from opposite strands. It has recently become possible to retain the strand information by modifying the RNA-seq protocol, known as strand-specific or stranded RNA-seq. Here, researchers from Pfizer Worldwide Research & Development evaluated the advantages of stranded RNA-seq in transcriptome profiling of whole blood RNA samples compared with non-stranded RNA-seq, and investigated the influence of gene overlaps on gene expression profiling results based on practical RNA-seq datasets and also from a theoretical perspective.
Non-stranded versus stranded RNA-seq protocol. The stranded protocol differs from the non-stranded protocol in two ways. First, during cDNA synthesis, the second-strand synthesis continues as normal except the nucleotide mix includes dUTPs instead of dTTPs. Second, after library preparation, a second-strand digestion step is added. This step ensures that only the first strand survives the subsequent PCR amplification step and hence the strand information of the libraries
The results demonstrated a substantial impact of stranded RNA-seq on transcriptome profiling and gene expression measurements. As many as 1751 genes in Gencode Release 19 were identified to be differentially expressed when comparing stranded and non-stranded RNA-seq whole blood samples. Antisense and pseudogenes were significantly enriched in differential expression analyses. Because stranded RNA-seq retains strand information of a read, one can resolve read ambiguity in overlapping genes transcribed from opposite strands, which provides a more accurate quantification of gene expression levels compared with traditional non-stranded RNA-seq.
Metrics for RNA-seq. a) The sequencing library size; b) the mapping summaries for sequence reads; c) the counting summaries for uniquely mapped reads; d) the ambiguous reads arising from gene overlapping; on average, the percentage of ambiguous reads drops approximately 3.1 % from non-stranded to stranded RNA-seq, and this drop roughly represents the overlapping arising from opposite strands; e) the correlation for gene expression profile among those eight samples; the samples are clearly clustered by sequencing protocol; f) the boxplot of gene expression