Estimating Gene Expression Levels from RNA-Seq Data

Researchers compare two strategies for estimating gene expression levels from RNA-seq data.

The estimation of genes’ transcript abundance levels or gene expression levels is an important question in research on the transcriptional regulation and gene functions.

There are two commonly used strategies, however they produce different results.

1.       UI-based – Reads Per Kilo-base per Million reads (RPKM), taking the union-intersection genes

2.       Isoform-based – summing up inferred isoform abundance

Their results showed that the isoform-based method gives not only more accurate estimation but also has less uncertainty than the UI-based strategy. If taking into account the non-uniformity of read distribution, the isoform-based method can further reduce estimation errors. They applied both strategies to real RNA-seq datasets of technical replicates, and found that the isoform-based strategy also displays a better performance. For a more accurate estimation of gene expression levels from RNA-seq data, even if the abundance levels of isoforms are not of interest, it is still better to first infer the isoform abundance and sum them up to get the expression level of a gene as a whole.

Wang X, Wu Z, Zhang X. (2010) Isoform Abundance Inference Provides a More Accurate Estimation of Gene Expression Levels in RNA-Seq. J Bioinform Comput Biol 8(supp01), 177-92. [abstract]