The majority of Next-Generation Sequencing (NGS) technologies effectively sample small amounts of DNA or RNA that are amplified (i.e., copied) prior to sequencing. The amplification process is not perfect, leading to extreme bias in sequenced read counts. Researchers at Purdue University have developed a novel procedure to account for amplification bias and demonstrate its effectiveness in mitigating gene length dependence when estimating true gene expression.
They tested the proposed method on simulated and real data. Simulations indicated that their method captures true gene expression more effectively than classic censoring-based approaches and leads to power gains in differential expression testing, particularly for shorter genes with high transcription rates. They applied the method to an unreplicated Arabidopsis RNA-seq data set resulting in disparate gene ontologies arising from gene set enrichment analyses.
Availability – R code to perform the RASTA procedures is freely available on the web at www.stat.purdue.edu/~doerge/
- Baumann DD, Doerge RW. (2013) Robust Adjustment of Sequence Tag Abundance. Bioinformatics [Epub ahead of print]. [abstract]