A pipeline for rapid gene discovery and expression analysis of a plant host and its obligate parasite

Current and emerging plant diseases caused by obligate parasitic microbes such as rusts, downy mildews, and powdery mildews threaten worldwide crop production and food safety. These obligate parasites are typically unculturable in the laboratory, posing technical challenges to characterize them at the genetic and genomic level.

Here researchers from UMASS Amherst have developed a data analysis pipeline integrating several bioinformatic software programs. This pipeline facilitates rapid gene discovery and expression analysis of a plant host and its obligate parasite simultaneously by next generation sequencing of mixed host and pathogen RNA (i.e., metatranscriptomics). They applied this pipeline to metatranscriptomic sequencing data of sweet basil (Ocimum basilicum) and its obligate downy mildew parasite Peronospora belbahrii, both lacking a sequenced genome. Even with a single data point, the researchers were able to identify both candidate host defense genes and pathogen virulence genes that are highly expressed during infection. This demonstrates the power of this pipeline for identifying genes important in host-pathogen interactions without prior genomic information for either the plant host or the obligate biotrophic pathogen. The simplicity of this pipeline makes it accessible to researchers with limited computational skills and applicable to metatranscriptomic data analysis in a wide range of plant-obligate-parasite systems.

Diagram summarizing the data analysis pipeline to analyze host–pathogen
metatranscriptomes and key methodological steps


After quality filtering, RNA-seq reads are assembled de novo using Trinity. For pathogen transcript discovery, a “pooled reference” is assembled combining control and infected plant reads, which are further divided into control-unique, infected-unique, and shared groups. For plant differential gene expression analysis, shared transcripts are used as a reference, against which control and infected reads are mapped by RSEM. A = DGE analysis for pathogen transcripts are subject to availability of a reference sample.

Availability – A complete protocol for performing metatranscriptomic data analysis using the pipeline is available here.

Guo L, Allen KS, Deiulio G, Zhang Y, Madeiras AM, Wick RL, Ma LJ. (2016) A De Novo-Assembly Based Data Analysis Pipeline for Plant Obligate Parasite Metatranscriptomic Studies. Front Plant Sci 7:925. [article]

Leave a Reply

Your email address will not be published. Required fields are marked *


Time limit is exhausted. Please reload CAPTCHA.