Transposable elements (TEs) are DNA sequences which are capable of moving from one location to another and represent a large proportion (45%) of the human genome. TEs have functional roles in a variety of biological phenomena such as cancer, neurodegenerative disease, and aging. Rapid development in RNA-sequencing technology has enabled us, for the first time, to study the activity of TE at the systems level. However, efficient TE analysis tools are not yet developed.
In this work, Baylor College of Medicine researchers have developed SalmonTE, a fast and reliable pipeline for the quantification of TEs from RNA-seq data. The researchers benchmarked their tool against TEtranscripts, a widely used TE quantification method, and three other quantification methods using several RNA-seq datasets from Drosophila melanogaster and human cell-line. They achieved 20 times faster execution speed without compromising the accuracy. This pipeline will enable the biomedical research community to quantify and analyze TEs from large amounts of data and lead to novel TE centric discoveries.
An illustration of the SalmonTE pipeline
Left Panel: Input from Repbase to build the mapping index, raw FASTQ, and covariates for statistical testing. Middle Panel: The work ow of SalmonTE consists of three parts: building the index based on Repbase or user-input cDNA sequences of TEs, quanti cation based on FASTQ, and statical test through the generalized linear model or differential expression analysis. Right Panel: Example output including the statistical report and box plot on estimated log2 fold-change.
Availability – The entire source code and executable scripts are available at: https://github.com/hyunhwaj/SalmonTE