To comprehensively study extracellular small RNAs (sRNA) by sequencing (sRNA-seq), researchers at the Vanderbilt University Medical Center developed a novel pipeline to overcome current limitations in analysis entitled, “Tools for Integrative Genome analysis of Extracellular sRNAs (TIGER)”. To demonstrate the power of this tool, sRNA-seq was performed on mouse lipoproteins, bile, urine and livers. A key advance for the TIGER pipeline is the ability to analyse both host and non-host sRNAs at genomic, parent RNA and individual fragment levels. TIGER was able to identify approximately 60% of sRNAs on lipoproteins and >85% of sRNAs in liver, bile and urine, a significant advance compared to existing software. Moreover, TIGER facilitated the comparison of lipoprotein sRNA signatures to disparate sample types at each level using hierarchical clustering, correlations, beta-dispersions, principal coordinate analysis and permutational multivariate analysis of variance. TIGER analysis was also used to quantify distinct features of exRNAs, including 5′ miRNA variants, 3′ miRNA non-templated additions and parent RNA positional coverage. Results suggest that the majority of sRNAs on lipoproteins are non-host sRNAs derived from bacterial sources in the microbiome and environment, specifically rRNA-derived sRNAs from Proteobacteria. Collectively, TIGER facilitated novel discoveries of lipoprotein and biofluid sRNAs and has tremendous applicability for the field of extracellular RNA.
Schematic of the TIGER sRNA-seq analysis workflow
Total reads from sRNA-seq platform are filtered through pre-processing steps (green) to yield total quality reads. Filtered reads are then applied to a class-independent analysis (red), which compares the most abundant reads of each sample/group, regardless of mapping identity. Independently, filtered reads are aligned to the host genome (e.g. mouse; light blue) and categorized by sRNA type for analysis. Quality reads that are >19 nt that failed to align to the host genome are then separately aligned to either bacterial and fungal genome databases (purple) or exogenous rRNA, RNA and miRNA databases (gold). Results of host and non-host segments of the pipeline are summarized and plotted (navy). Lastly, reads that fail to map in host and non-host segments are sorted by abundance for comparison and submitted for BLASTn to identify putative origins (orange).