Biases during read mapping can be avoided by mapping reads to two alternative genomes

Genetic variation in cis-regulatory elements is an important cause of variation in gene expression. Cis-regulatory variation can be detected by using high-throughput RNA sequencing (RNA-seq) to identify differences in the expression of the two alleles of a gene. This requires that reads from the two alleles are equally likely to map to a reference genome(s), and that SNPs are accurately called, so that reads derived from the different alleles can be identified. Both of these prerequisites can be achieved by sequencing the genomes of the parents of the individual being studied, but this is often prohibitively costly.

Researchers from the University of Cambridge now demonstrate (in Drosophila) that  biases during read mapping can be avoided by mapping reads to two alternative genomes that incorporate SNPs called from the RNA-seq data. The SNPs can be reliably called from the RNA-seq data itself, provided any variants not found in high-quality SNP databases are filtered out. Finally, they suggest a way of measuring allele specific expression by crossing the line of interest to a reference line with a high quality genome sequence. Combined with their bioinformatic methods, this approach minimizes mapping biases, allows poor quality data to be identified and removed, and aides in the biological interpretation of the data as the parent of origin of each allele is known. In conclusion, these results suggest that accurate estimates of allele specific expression do not require the parental genomes of the individual being studied to be sequenced.

Availability: Scripts used to perform this analysis are available at


Quinn A, Juneja P, Jiggins FM. (2014) Estimates of allele-specific expression in Drosophila with a single genome sequence and RNA-seq data. Bioinformatics [Epub ahead of print]. [abstract]