Change point problems arise in many genomic analyses such as the detection of copy number variations or the detection of transcribed regions. The expanding Next Generation Sequencing technologies now allow to locate change points at the nucleotide resolution.
Because of its complexity which is almost linear in the sequence length when the maximal number of segments is constant, and as its performance had been acknowledged for microarrays, researchers at AgroParisTech propose to use the Pruned Dynamic Programming algorithm for Seq-experiment outputs. This requires the adaptation of the algorithm to the negative binomial distribution with which they model the data. The researchers show that if the dispersion in the signal is known, the PDP algorithm can be used, and they provide an estimator for this dispersion. They describe a compression framework which reduces the time complexity without modifying the accuracy of the segmentation and they propose to estimate the number of segments via a penalized likelihood criterion. Finally, they illustrate the performance of the proposed methodology on RNA-Seq data.
Availability – Segmentor3IsBack is available as an R package on the CRAN repository at: http://cran.r-project.org/web/packages/Segmentor3IsBack/index.html