In line with the importance of RNA-seq, the bioinformatics community has produced numerous data analysis tools incorporating methods to correct sample-specific biases. However, few advanced simulation tools exist to enable benchmarking of competing correction methods. Now, researchers at EMBL-EBI, United Kingdom introduce the first framework to reproduce the properties of individual RNA-seq runs and, by applying it on several datasets, they demonstrate the importance of accounting for sample-specificity in realistic simulations.
The rlsim package (https://github.com/sbotond/rlsim) is the first advanced simulation frame-work to reproduce the properties of specific Illumina RNA-seq datasets rlsim simulates key steps of RNA-seq library construction protocols (e.g. fragmentation, priming, PCR amplification, size selection) with particular focus on the latter steps (PCR, size selection) that can be informed by the analysis of specific datasets. To this end, the package provides tools for estimating insert size distribution, corrected and uncorrected relative expression levels and GC-dependent amplification efficiencies using an approach that can be thought of as the extension of the method of Benjamini and Speed (2012) to RNA-seq data and is similar to bias correction approaches such as that implemented in BitSeq (Glaus et al., 2012). These parameters, estimated from real datasets, are combined with a “priming affinity” model inspired by a thermodynamical model of oligonucleotide hybridization. The details of the framework are described in the package documentation (http://bit.ly/rlsim-doc)
Realistic simulations reveal extensive sample-specificity of RNA-seq biases
Botond Sipos, Greg Slodkowicz, Tim Massingham, Nick Goldman
(Submitted on 14 Aug 2013) [article]