Expression analysis of RNA sequencing data depends on technical replication and normalization methods

The potential for astrocyte participation in central nervous system recovery is highlighted by in vitro experiments demonstrating their capacity to transdifferentiate into neurons. Understanding astrocyte plasticity could be advanced by comparing astrocytes with stem cells. RNA sequencing (RNA-seq) is ideal for comparing differences across cell types. However, this novel multi-stage process has the potential to introduce unwanted technical variation at several points in the experimental workflow. Quantitative understanding of the contribution of experimental parameters to technical variation would facilitate the design of robust RNA-Seq experiments.

Researchers from New Mexico State University utlilized RNA-Seq to achieve their biological and technical objectives. The biological aspect compared gene expression between normal human fetal-derived astrocytes and human neural stem cells cultured in identical conditions. When differential expression threshold criteria of |log2 fold change| > 2 were applied to the data, no significant differences were observed. The technical component quantified variation arising from particular steps in the research pathway, and compared the ability of different normalization methods to reduce unwanted variance. To facilitate this objective, a liberal false discovery rate of 10% and a |log2 fold change| > 0.5 were implemented for the differential expression threshold. Data were normalized with RPKM, TMM, and UQS methods using JMP Genomics. The contributions of key replicable experimental parameters (cell lot; library preparation; flow cell) to variance in the data were evaluated using principal variance component analysis. Their  analysis showed that, although the variance for every parameter is strongly influenced by the normalization method, the largest contributor to technical variance was library preparation. The ability to detect differentially expressed genes was also affected by normalization; differences were only detected in non-normalized and TMM-normalized data.

The similarity in gene expression between astrocytes and neural stem cells supports the potential for astrocytic transdifferentiation into neurons, and emphasizes the need to evaluate the therapeutic potential of astrocytes for central nervous system damage. The choice of normalization method influences the contributions to experimental variance as well as the outcomes of differential expression analysis. However irrespective of normalization method, these findings illustrate that library preparation contributed the largest component of technical variance.

Conditional (cell line) and technical (library, lot, flow cell)
contributions to variation between replicate samples


Box plots (a, c, e, g) show log2 fold-changes between technical replicates for both cell lines (hNSC, green; NHA, violet) and between cell lines (blue). Pie charts demonstrate that the contribution of different components to sample variance (b, d, f, h). Principal variance component analysis revealed the influence of cell line (blue), flow cell (red), library preparation (yellow), cell lot (light green), and re-sidual variance (black). The estimates are shown for log2 transformed data (a, b), RPKM normalized data (c, d), TMM normalized data (E,F) and UQS normalized data (G, H).

Knight VB, Serrano EE. (2018) Expression analysis of RNA sequencing data from human neural and glial cell lines depends on technical replication and normalization methods. BMC Bioinformatics 19(Suppl 14):412. [article]

Leave a Reply

Your email address will not be published. Required fields are marked *


Time limit is exhausted. Please reload CAPTCHA.