Host-pathogen interactions are exceedingly complex because they involve multiple host tissues, often occur in the context of normal microflora, and can span diverse microenvironments. Although decades of gene expression studies have provided detailed insights into infection processes, technical challenges have restricted experiments to single pathogenic species or host tissues. RNA-sequencing (RNA-seq) has revolutionized the study of gene expression because in addition to quantifying transcriptional output, it allows detection and characterization of all transcripts in a genome.
Here, researchers from Trinity College Dublin, describe how refined approaches to RNA-seq are used to map the transcriptional networks that control host-pathogen interactions. These enhanced techniques include dRNA-seq and term-seq for the fine-scale mapping of transcriptional start and termination sites, and dual RNA-seq for simultaneous sequencing of host and bacterial pathogen transcriptomes. Dual RNA-seq experiments are currently limited to in vitro infection systems that do not fully reflect the complexities of the in vivo environment, thus a challenge is to develop in vivo model systems and experimental approaches that address the biological heterogeneity of host environments, followed by the integration of RNA-seq with other genome-scale datasets to identify the transcriptional networks that mediate host-pathogen interactions.
Sequencing technologies to uncover transcriptional architecture
(a) Precise mapping of transcription start sites is achieved using dRNA-seq, TSS-EMOTE or Cappable-seq. Primary transcripts are enriched from total bacterial RNA through depletion of processed RNA using TEX or XRN1 exonucleases (dRNA-seq and TSS-emote, respectively) or by using a Desthiobiotin cap and Streptavidin beads to directly target and capture 5′ tri-phosphorylated transcript ends (cappable-seq). TSS-EMOTE involves ligation of an RNA oligonucleotide (blue) to 5′ ends of transcripts after removal of a 5′-pyrophosphate via RppH. 5′ tri-phosphate enrichment is followed by cDNA library synthesis and RNA-seq. Sequenced reads are then mapped to reference genomes and TSS are identified through comparison of enriched peaks with non-treated control samples (dRNA-seq, Cappable-seq), or the identification of oligo-adjacent sequences (TSS-EMOTE). (b) Term-seq enables the mapping of 3′-ends of bacterial transcripts through the ligation of 3′-end adaptor molecules of known sequence to RNA 3′-ends. Subsequent cDNA synthesis and RNA-seq lead to the identification of transcript ends through the discrimination of bacterial transcript sequences from the linker sequences. (c) Dual RNA-seq enables the simultaneous sequencing of RNA from infected host cells and the intracellular pathogen, without previous separation. Total bacterial and eukaryotic RNA is isolated simultaneously and subjected to combined cDNA library preparation and RNA-seq. Separation of bacterial and eukaryotic sequencing reads occurs in silico and enables downstream gene expression analysis.