contamDE – differential expression analysis of RNA-Seq data for contaminated tumor samples

Accurate detection of differentially expressed genes between tumor and normal samples is a primary approach of cancer-related biomarker identification. Due to the infiltration of tumor surrounding normal cells, the expression data derived from tumor samples would always be contaminated with normal cells. Ignoring such cellular contamination would deflate the power of detecting DE genes and further confound the biological interpretation of the analysis results. For the time being, there does not exist any differential expression analysis approach for RNA-seq data in literature that can properly account for the contamination of tumor samples.

Without appealing to any extra information, researchers from Fudan University have developed a new method ‘contamDE’ based on a novel statistical model that associates RNA-seq expression levels with cell types. It is demonstrated through simulation studies that contamDE could be much more powerful than the existing methods that ignore the contamination. In the application to two cancer studies, contamDE uniquely found several potential therapy and prognostic biomarkers of prostate cancer and non-small cell lung cancer.


Estimated proportions for contaminated tumor samples in real data applications. (A) a NSCLC study with both DNA-seq and RNA-seq data; (B) a study of lung adenocarcinoma cell lines (pure samples); (C) a study of lung adenocarcinoma cell lines (experimental mixture samples); (D) a Drosophila melanogaster study.

Availability – An R package contamDE is freely available at

Shen Q, Hu J, Jiang N, Hu X, Luo Z, Zhang H. (2015) contamDE: Differential expression analysis of RNA-seq data for contaminated tumor samples. Bioinformatics [Epub ahead of print]. [abstract]

Leave a Reply

Your email address will not be published. Required fields are marked *


Time limit is exhausted. Please reload CAPTCHA.