TransPi – a comprehensive TRanscriptome ANalysiS PIpeline for de novo transcriptome assembly

The use of RNA-Seq data and the generation of de novo transcriptome assemblies have been pivotal for studies in ecology and evolution. This is distinctly true for non-model organisms, where no genome information is available; yet, studies of differential gene expression, DNA enrichment baits design, and phylogenetics can all be accomplished with the data gathered at the transcrip- tomic level. Multiple tools are available for transcriptome assembly, however, no single tool can provide the best assembly for all datasets. Therefore, a multi assembler approach, followed by a reduction step, is often sought to generate an improved representation of the assembly. To reduce errors in these complex analyses while at the same time attaining reproducibility and scalability, automated workflows have been essential in the analysis of RNA-Seq data. However, most of these tools are designed for species where genome data is used as reference for the assembly process, limiting their use in non-model organisms.

Researchers from Ludwig Maximilian University of Munich have developed TransPi, a comprehensive pipeline for de novo transcriptome assembly, with minimum user input but without losing the ability of a thorough analysis. A combination of different model organisms, kmer sets, read lengths, and read quantities were used for assessing the tool. Furthermore, a total of 49 non-model organisms, spanning different phyla, were also analyzed. Compared to approaches using single assemblers only, TransPi produces higher BUSCO completeness percentages, and a concurrent significant reduction in duplication rates. TransPi is easy to configure and can be deployed seamlessly using Conda, Docker and Singularity.

TransPi v1.0.0 flowchart showing the various steps and analyses it can performed


For simplicity, this diagram does not show all the connections between the processes. Also, it omits other additional options like the BUSCO distribution and transcriptome filtering with psytrans (see Section 2.6). ORFs=Open reading Frames; HTML=Hypertext Markup Language

Rivera-Vicens RE, Escudero CG, Conci N, Eitel M, Wörheide G. (2021) TransPi – a comprehensive TRanscriptome ANalysiS PIpeline for de novo transcriptome assembly. bioRXiv [online preprint]. [abstract]

