RNA sequencing (RNA-seq) is a powerful tool for genome-wide expression profiling of biological samples with the advantage of high-throughput and high resolution. There are many existing algorithms nowadays for quantifying expression levels and detecting differential gene expression, but none of them takes the misaligned reads that are mapped to non-exonic regions into account. Researchers at UTHSCSA developed a novel algorithm, XBSeq, where a statistical model was established based on the assumption that observed signals are the convolution of true expression signals and sequencing noises. The mapped reads in non-exonic regions are considered as sequencing noises, which follows a Poisson distribution. Given measureable observed and noise signals from RNA-seq data, true expression signals, assuming governed by the negative binomial distribution, can be delineated and thus the accurate detection of differential expressed genes.
The researchers implemented their novel XBSeq algorithm and evaluated it by using a set of simulated expression datasets under different conditions, using a combination of negative binomial and Poisson distributions with parameters derived from real RNA-seq data. They compared the performance of their method with other commonly used differential expression analysis algorithms. They also evaluated the changes in true and false positive rates with variations in biological replicates, differential fold changes, and expression levels in non-exonic regions. They also tested the algorithm on a set of real RNA-seq data where the common and different detection results from different algorithms were reported.
(A) Illustration of exonic and non-exonic reads. (B) Histogram of sequence read counts in RPKM. The histogram of observed signal (X) is plotted in blue and the histogram of non-exonic read counts (B) in pink.
When background noise is at baseline level, the performance of XBSeq and DESeq are mostly equivalent. However, this method surpasses DESeq and other algorithms with the increase of non-exonic mapped reads. Only in very low read count condition XBSeq had a slightly higher false discovery rate, which may be improved by adjusting the background noise effect in this situation. Taken together, by considering non-exonic mapped reads, XBSeq can provide accurate expression measurement and thus detect differential expressed genes even in noisy conditions.
Availability – The R package of XBSeq, the shift-gene gtf files as well as reproducible scripts for simulation are available from GitHub, https://github.com/Liuy12/XBSeq.