Accuracy of RNA-Seq and its dependence on sequencing depth

The cost of DNA sequencing has undergone a dramatical reduction in the past decade. As a result, sequencing technologies have been increasingly applied to genomic research. RNA-Seq is becoming a common technique for surveying gene expression based on DNA sequencing. As it is not clear how increased sequencing capacity has affected measurement accuracy of mRNA, we sought to investigate that relationship.

Researchers at the University of Texas MD Anderson Cancer Center have empirically evaluated the accuracy of repeated gene expression measurements using RNA-Seq. They identifed library preparation steps prior to DNA sequencing as the main source of error in this process. Studying three datasets, they show that the accuracy indeed improves with the sequencing depth. However, the rate of improvement as a function of sequence reads is generally slower than predicted by the binomial distribution. They therefore used the beta-binomial distribution to model the overdispersion. The overdispersion parameters they introduced depend explicitly on the number of reads so that the resulting statistical uncertainty is consistent with the empirical data that measurement accuracy increases with the sequencing depth. The overdispersion parameters were determined by maximizing the likelihood. They show that their modified beta-binomial model had lower false discovery rate than the binomial or the pure beta-binomial models.

Sequencing Depth

  • Cai G, Li H, Lu Y, Huang X, Lee J, Müller P, Ji Y, Liang S. (2012) Accuracy of RNA-Seq and its dependence on sequencing depth. BMC Bioinformatics 13 Suppl 13, S5. [article]