Nowadays, knowledge extraction methods from Next Generation Sequencing data are highly requested. In this work, researchers at the National Research Council, Italy focus on RNA-seq gene expression analysis and specifically on case-control studies with rule-based supervised classification algorithms that build a model able to discriminate cases from controls. State of the art algorithms compute a single classification model that contains few features (genes). On the contrary, these researchers goal is to elicit a higher amount of knowledge by computing many classification models, and therefore to identify most of the genes related to the predicted class.
The researchers have developed CAMUR, a new method that extracts multiple and equivalent classification models. CAMUR iteratively computes a rule-based classification model, calculates the power set of the genes present in the rules, iteratively eliminates those combinations from the data set, and performs again the classification procedure until a stopping criterion is verified. CAMUR includes an ad-hoc knowledge repository (database) and a querying tool.They analyze three different types of RNA-seq data sets (Breast, Head and Neck, and Stomach Cancer) from The Cancer Genome Atlas (TCGA) and they validate CAMUR and its models also on non-TCGA data. Their experimental results show the efficacy of CAMUR: they obtain several reliable equivalent classification models, from which the most frequent genes, their relationships, and the relation with a particular cancer are deduced.
Component diagram of the MSE part of the CAMUR software package
The workflow of the software is as follows: the InputManager processes the user input data (data matrix) and the parameters (e.g., maximum number of iterations, execution mode), input data are taken by the CamurLauncher and managed through the DataElaborator. Then, CamurLauncher performs the iterative classification by managing the feature eliminations and combinations through the FeaturesManager. The ResultsElaborator stores the classification models and results in the database with the aid of the DataAccessObject(DAO).