A new strategy for mapping RNA-Seq reads

Accurate estimation of expression levels from RNA-Seq data entails precise mapping of the sequence reads to a reference genome. Because the standard reference genome contains only one allele at any given locus, reads overlapping polymorphic loci that carry a non-reference allele are at least one mismatch away from the reference and, hence, are less likely to be mapped. This bias in read mapping leads to inaccurate estimates of allele-specific expression (ASE).

To address this read-mapping bias, researchers at the DoD Biotechnology Software Applications Institute proposed the construction of an enhanced reference genome that includes the alternative alleles at known polymorphic loci. They show that mapping to this enhanced reference reduced the read-mapping biases, leading to more reliable estimates of ASE.

Experiments on simulated data show that the proposed strategy reduced the number of loci with mapping bias by ≥63% when compared with a previous approach that relies on masking the polymorphic loci and by ≥18% when compared with the standard approach that uses an unaltered reference. When this strategy was applied to actual RNA-Seq data, up to 15% more reads were mapped than the previous approaches and many seemingly incorrect inferences were identified.

The executables to construct the enhanced reference genome and the Perl scripts to analyze the mapped reads are available for download from http://www.bhsai.org/downloads/ase/

  • Vijaya Satya R, Zavaljevski N, Reifman J. (2012) A new strategy to reduce allelic bias in RNA-Seq readmapping. Nucleic Acids Res [Epub ahead of print]. [article]
