Polyester – simulating RNA-seq datasets with differential transcript expression

Statistical methods development for differential expression analysis of RNA sequencing (RNA-seq) requires software tools to assess accuracy and error rate control. Since true differential expression status is often unknown in experimental datasets, artificially-constructed datasets must be utilized, either by generating costly spike-in experiments or by simulating RNA-seq data.

Polyester is an R package designed to simulate RNA-seq data, beginning with an experimental design and ending with collections of RNA-seq reads. Its main advantage is the ability to simulate reads indicating isoform-level differential expression across biological replicates for a variety of experimental designs. Data generated by Polyester is a reasonable approximation to real RNA-seq data and standard differential expression workflows can recover differential expression set in the simulation by the user.


Coverage comparison to GEUVADIS data set

Availability – Polyester is freely available from Bioconductor: http://bioconductor.org/

Contact: jtleek@gmail.com

Frazee AC, Jaffe AE, Langmead B, Leek JT. (2015) Polyester: simulating RNA-seq datasets with differential transcript expression. Bioinformatics [Epub ahead of print]. [abstract]

