For RNA-seq experiments, besides the randomization in preparing the research subjects, there are many other steps to consider for randomization due to the complexity of the technologies. For example, we can randomize the sample order for various steps in the library construction and the order/location of the samples in the sequencer.
The most desirable replicates are the biological replicates, which are true replicates and provide us the variation among biological samples. Some studies include biological replicates, while many others only have technical replicates that are repeated measurements from the same biological sample. If the goal is to evaluate the technology, technical replicates alone are sufficient.
RNA-Seq Specific Effects
RNA-seq experiments can be affected by common variability coming from various technical effects like processing date, technician and reagent batch. However, there are some recognized technical effects specific to the RNA-seq procedures. Among these sources of variation, the library preparation effect is the largest. The flow cell and lane effects are relatively small.
Due to the random sampling nature of RNA-seq, it will take a large number of sequences to measure the transcripts that are expressed at low level. For a given budget, it is critical to decide whether to increase the sequencing depth to have more accurate measurements on the genes expressed at low level or increase the sample size with limited sequencing depth for each sample. It would take extremely deep coverage in order to detect allelic differential expression for genes expressed at a fairly low level.
At the same sequencing depth, the pair-end sequences increase the sensitivity and specificity of the detection of the alternative splicing and chimeras in comparison with the single end sequencing.
Biases of Next-Generation Sequencing
In reality, sequence reads are not exactly randomly obtained from transcripts. Biases have been found to be related to GC content of the sequence, the use of the random hexamer primers, 3′ and 5′ depletion or bias towards 3′-end, and bias toward specific RNA species. Most of these biases are related to library preparation methods. From the experimental design point of view, these biases increase the required samples size and sequence depth, which emphasize the importance of choosing better protocols and selecting the right analysis methods.
Sample Size Calculation for RNA-Seq
The sample size may be determined at two levels—the number of lanes for technical replicates in one treatment or the number of biological replicates for each treatment. In the cases when there are only technical replicates and the library preparation effects and lane effects are negligible or mitigated by proper designs, sample sizes can be calculated gene-by-gene based on Poisson models. When there are biological replicates and the over-dispersion problem exists, NB distributions are more appropriate than Poisson distributions to model the RNA-seq data. First obtain the sample sizes for one gene and then determine the overall sample size based on the overall average power.
It is worth pointing out that validation using qRT-PCR on the same RNA samples assayed in the RNA-seq analysis only validates the technology. It does not validate the conclusion about the treatments/conditions. It is the validation using different biological replicates from the same populations that can further validate the biological conclusions from RNA-seq experiments.
Fang Z, Cui X. (2011) Design and validation issues in RNA-seq experiments. Brief Bioinform. 12(3), 280-87. [abstract]