from RPubs – by Prasanth A S
In order to explain the different methods of normalization and their problems with large array of data, which are assumed to have:
- most genes are not differentially expressed across conditions.
- the distribution of genes across the samples are roughly the same
Two sets of plot showing probability density and cumulative density of a vector of gene expression for a single sample and an average of sorted gene expression vectors.
Which of the normalization method (among Loess normalization, Quantile normalization or Variance stabilizing normalizations) would turn a value of 60 into 45 as we go from single to the average of multiple genes.
Compute the across sample log (base 2) scale SD for each probe in
What is the median SD for the non-spiked in genes (use the index
spikeinIndexto exclude spike-ins)?
Note that the first summary in the previous question relates to specificity: we want measurements across replicate arrays to be similar and thus the variability low. However, because we designed the experiment for concentrations in the spike-ins to vary across samples, we want the concentration of spike-in genes to change so that variability should be higher. A successful normalization approach will improve specificity (lower the SD of non-spiked-in genes) without affecting sensitivity (leave SD of spiked-in genes the same).
Now use the
normalize.quantiles in the
preprocessCore package to normalize all the probes together (make sure to normalize in the original scale)
Now recalculate the median SDs as in the previous assessment question. Remember to use the same
What is the median SD for the spiked in genes after normalization?
As we discussed in the videos, normalization techniques such as quantile normalization are not always appropriate. An example dataset comes from the dataset described in the paper Loven et al. (2012) described in the videos and included in this package: