RNA-Seq is becoming a common technique for surveying gene expression based on DNA sequencing. Researchers at MD Anderson Cancer Center set out to investigate that relationship how increased sequencing capacity has affected measurement accuracy of mRNA.
They identified library preparation steps prior to DNA sequencing as the main source of error in this process. Studying three datasets, they show that the accuracy indeed improves with the sequencing depth. However, the rate of improvement as a function of sequence reads is generally slower than predicted by the binomial distribution. Therefore, they used the beta-binomial distribution to model the overdispersion. The overdispersion parameters they introduced depend explicitly on the number of reads so that the resulting statistical uncertainty is consistent with the empirical data that measurement accuracy increases with the sequencing depth. The overdispersion parameters were determined by maximizing the likelihood.
- Cai G, Li H, Lu Y, Huang X, Lee J, Müller P, Ji Y, Liang S. (2012) Accuracy of RNA-Seq and its dependence on sequencing depth. BMC Bioinformatics 13(Suppl 13), S5. [article]