TEtranscripts – A package for including transposable elements in differential expression analysis of RNA-Seq datasets

Most RNA-seq data analysis software packages are not designed to handle the complexities involved in properly apportioning short sequencing reads to highly repetitive regions of the genome. These regions are often occupied by transposable elements (TEs), which make up between 20-80% of eukaryotic genomes. They can contribute a substantial portion of transcriptomic and genomic sequence reads, but are typically ignored in most analyses.

Researchers from Cold Spring Harbor Laboratory have developed a method and software package for including both gene- and TE-associated ambiguously mapped reads in differential expression analysis. This method shows improved recovery of TE transcripts over other published expression analysis methods, in both synthetic data and qPCR/NanoString-validated published datasets.

rna-seqTEtranscripts flow chart. Reads mapping to TEs are assigned in two different modes: uniq (reads mapping uniquely in the genome), and multi (reads mapping to multiple insertions of TEs). In the multi mode, an iterative algorithm is used to optimally distribute ambiguously mapped reads.

Availability – The source code, associated GTF files for TE annotation, and testing data are freely available at http://hammelllab.labsites.cshl.edu/software

Contactmhammell@cshl.edu

Jin Y, Tam OH, Paniagua E, Hammell M. (2015) TEtranscripts: A package for including transposable elements in differential expression analysis of RNA-seq datasets. Bioinformatics [Epub ahead of print]. [abstract]

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Time limit is exhausted. Please reload CAPTCHA.