The recent RNA-seq technology is an attractive method to study gene expression. One of the most important goals in RNA-seq data analysis is to detect genes differentially expressed across treatments. Although several statistical methods have been published, there are no theoretical justifications for whether these methods are optimal or how to search for the optimal test. Furthermore, most proposed tests are designed for testing whether the mean expression levels are exactly the same or not across treatments, whereas sometimes, biologists are interested in detecting genes with expression changes larger than a certain threshold. Another issue with current methods is that the false discovery rate (FDR) control is not well studied.
In this manuscript, researchers at Iowa State University propose a test to address all the above issues. Under model assumptions, they derive an optimal test that achieves the maximum of average power among those that control FDR at the same level. They also provide an approximated version, the approximated most average powerful (AMAP) test, for practical implementation. The proposed method allows for testing null hypotheses that are much more general than the ones most previous studies have considered, and it leads to a natural way of controlling the FDR. Through simulation studies, they show that their test has a higher power than other methods, including the widely-used edgeR, DESeq, and baySeq methods, as well as control than two other FDR control procedures commonly used in practice. For demonstration, they also apply the proposed method to a real RNA-seq dataset obtained from maize.
Availability – The R package, AMAP.Seq, implements our proposed AMAP test and is a publicly available at http://www.r-project.org. Users can choose either Poisson or NB distribution to model the counts and specify their own estimates of the normalization factors or dispersion parameters. Our codes for the simulation studies are available at http://pliu.public.iastate.edu/AMAP.htm.
- Si Y, Liu P. (2013) An optimal test with maximum average power while controlling FDR with application to RNA-seq data. Biometrics [Epub ahead of print]. [abstract]