A practical implementation of large transcriptomic data analysis to resolve cryptic species diversity problems in microbial eukaryotes

Transcriptome sequencing has become a method of choice for evolutionary studies in microbial eukaryotes due to low cost and minimal sample requirements. Transcriptome data has been extensively used in phylogenomic studies to infer ancient evolutionary histories. However, its utility in studying cryptic species diversity is not well explored. An empirical investigation was conducted to test the applicability of transcriptome data in resolving two major types of discordances at lower taxonomic levels. These include cases where species have the same morphology but different genetics (cryptic species) and species of different morphologies but have the same genetics. Spelman College researchers have built a species comparison bioinformatic pipeline that takes into account the nature of transcriptome data in amoeboid microbes exemplifying such discordances.

Their analyses of known or suspected cryptic species yielded consistent results regardless of the methods of culturing, RNA collection or sequencing. Over 95% of the single copy genes analyzed in samples of the same species sequenced using different methods and cryptic species had intra- and interspecific divergences below 2%. Only a minority of groups (2.91-4.87%) had high distances exceeding 2% in these taxa, which was likely caused by low data quality. This pattern was also observed in suspected genetically similar species with different morphologies. Transcriptome data consistently delineated all taxa above species level, including cryptically diverse species. Using this approach the researchers were able to resolve cryptic species problems, uncover misidentification and discover new species. They also identified several potential barcode markers with varying evolutionary rates that can be used in lineages with different evolutionary histories.


The above pipeline was used to find OSGs in the following groups: Cochliopodium (C. pentatrifurcatum ATCC 30935 vs. C. minus CCAP 1537/1A (transcriptomes from two independent samples each) vs. C. minutoidum CCAP 1537/7), Endostelium (E. zonatum PRA-191 Tekle and Wood 2017 vs. E. zonatum PRA-191 Kang et al. vs. E. zonatum LINKS), and Thecamoebida (undescribed UK-YT1 vs. Thecamoebida RHP1–1 isolates). On average it takes about four hours to run our pipeline on a pair of species in a regular desktop computer with 32 GB memory.

Tekle YI, Wood FC. (2018) A practical implementation of large transcriptomic data analysis to resolve cryptic species diversity problems in microbial eukaryotes. BMC Evol Biol 18(1):170. [article]

Leave a Reply

Your email address will not be published. Required fields are marked *


Time limit is exhausted. Please reload CAPTCHA.