Various biases affect high-throughput sequencing read counts. Contrary to the general assumption, researchers from
Ludwig-Maximilians-University Munich show that bias does not always cancel out when fold changes are computed and that bias affects more than 20% of genes that are called differentially regulated in RNA-seq experiments with drastic effects on subsequent biological interpretation.
Here, they propose a novel approach to estimate fold changes. Their method is based on a probabilistic model that directly incorporates count ratios instead of read counts. It provides a theoretical foundation for pseudo-counts and can be used to estimate fold change credible intervals as well as normalization factors that outperform currently used normalization methods. The researchers show that fold change estimates are significantly improved by our method by comparing RNA-seq derived fold changes to qPCR data from the MAQC/SEQC project as a reference and analyzing random barcoded sequencing data.
Workflows for differential NGS analysis. Differential analysis of NGS data starts with the aligned reads of two conditions, here exemplified as RNA-seq reads from samples A and B aligned to an mRNA. Existing models take one specific route through the necessary steps defined in the main text: (I) For each sample, reads are aggregated and an appropriate probabilistic model is used to control noise and estimate the sample specific mRNA abundance. (II) These abundance estimates are then divided to give an estimate of the mRNA fold change. Our approach takes a different route by first computing local ratios for all read sequences and then aggregating them using an appropriate noise model for count ratios to estimate the total mRNA fold change. Using a basic noise model for the second step makes both routes equivalent. However, using extensions to it leads to more accurate fold change estimates by exploiting the fact that bias cancels out when taking the ratio of counts of individual sequences. Note that two important aspects of NGS (replicate experiments and normalization) are left out in this figure and are analyzed and discussed in the paper.
Availability – Our software implementation is freely available from the project website http://www.bio.ifi.lmu.de/software/lfc