It is often the case in biological measurement data that results are given as a ranked list of quantities-for example, differential expression (DE) of genes as inferred from microarrays or RNA-seq. Recent years brought considerable progress in statistical tools for enrichment analysis in ranked lists. Several tools are now available that allow users to break the fixed set paradigm in assessing statistical enrichment of sets of genes. Continuing with the example, these tools identify factors that may be associated with measured differential expression. A drawback of existing tools is their focus on identifying single factors associated with the observed or measured ranks, failing to address relationships between these factors. For example, a scenario in which genes targeted by multiple miRNAs play a central role in the DE signal but the effect of each single miRNA is too subtle to be detected, as shown in the results.
A flowchart describing the MULSEA algorithm. The output consists of the green and grey lists. The grey lists represent the reversed members of the output. The white ones are not included in the output collection
Researchers at the Israel Institute of Technology propose statistical and algorithmic approaches for selecting a sub-collection of factors that can be aggregated into one ranked list that is heuristically most associated with an input ranked list (pivot). They examine performance on simulated data and apply their approach to cancer datasets. They find small sub-collections of miRNA that are statistically associated with gene DE in several types of cancer, suggesting miRNA cooperativity in driving disease related processes. Many of their findings are consistent with known roles of miRNAs in cancer, while others suggest previously unknown roles for certain miRNAs.
Availability – Code and instructions for our algorithmic framework, MULSEA, are in: https://github.com/YakhiniGroup/MULSEA