microRNAs (miRNAs) are (18-22nt long) noncoding short (s)RNAs that suppress gene expression by targeting the 3′ untranslated region of target mRNAs. This occurs through the seed sequence located in position 2-7/8 of the miRNA guide strand, once it is loaded into the RNA induced silencing complex (RISC). G-rich 6mer seed sequences can kill cells by targeting C-rich 6mer seed matches located in genes that are critical for cell survival. This results in induction of Death Induced by Survival gene Elimination (DISE), through a mechanism we have called 6mer seed toxicity. miRNAs are often quantified in cells by aligning the reads from small (sm)RNA sequencing to the genome. However, the analysis of any smRNA Seq data set for predicted 6mer seed toxicity requires an alternative workflow, solely based on the exact position 2-7 of any short (s)RNA that can enter the RISC.
Researchers at Northwestern University’s Feinberg School of Medicine have developed SPOROS, a semi-automated pipeline that produces multiple useful outputs to predict and compare 6mer seed toxicity of cellular sRNAs, regardless of their nature, between different samples. The researchers provide two examples to illustrate the capabilities of SPOROS: Example one involves the analysis of RISC-bound sRNAs in a cancer cell line (either wild-type or two mutant lines unable to produce most miRNAs). Example two is based on a publicly available smRNA Seq data set from postmortem brains (either from normal or Alzheimer’s patients). These methods are designed to be used to analyze a variety of smRNA Seq data in various normal and disease settings.
SPOROS workflow developed to analyze seed toxicity of smRNA Seq data
From left to right: smRNA seq data, either total or RISC-bound, are trimmed and cleaned and then compiled into a counts table. Rare reads are removed (fewer counts than the number of samples) and the remaining reads are BLASTed against all mature miRNAs or RNA world data sets of all small RNAs (either human or mouse). Reads that hit artificial sequences in the RNA world datasets are again removed. The remaining raw counts table can be normalized to 1 million reads per sample or column or used for differential expression analysis. RawCounts, normCounts, or differential tables are annotated with 6mer seed, 6mer seed viability, miRNA, RNA world to generate Output A tables. At this point the miRNA content (%) can be determined. Reads of this Output A file are collapsed according to 6mer seed and RNA type resulting in Output B. At this point all short RNAs can be analyzed (sRNA) or just the miRNA fraction. Output B is fed into four scripts generating four output files: C: A Seed Tox graph that depicts all miRNAs as peaks according to their seed viability; D: Average predicted 6mer seed toxicity of all reads in a samples depicted as box and whisker plot; E: Weblogo plot showing the average seed composition in positions 1–6 of the 6mer seed in each sample; F: The result of a multinomial mixed model odds ratio analysis allowing to compare both different 6mer seeds as well as differences in each position of different seeds. The hierarchy of folders and subfolders generated by SPOROS is shown in a grey box.