Despite the tremendous increase in omics data generated by modern sequencing technologies, their analysis can be tricky and often requires substantial expertise in bioinformatics. To address this concern, researchers at the Indian Institute of Technology have developed a user-friendly pipeline to analyze (cancer) genomic data that takes in raw sequencing data (FASTQ format) as input and outputs insightful statistics. Their iCOMIC toolkit pipeline featuring many independent workflows is embedded in the popular Snakemake workflow management system. It can analyze whole-genome and transcriptome data and is characterized by a user-friendly GUI that offers several advantages, including minimal execution steps and eliminating the need for complex command-line arguments. Notably, the researchers have integrated algorithms developed in-house to predict pathogenicity among cancer-causing mutations and differentiate between tumor suppressor genes and oncogenes from somatic mutation data. They benchmarked their tool against Genome In A Bottle benchmark dataset (NA12878) and got the highest F1 score of 0.971 and 0.988 for indels and SNPs, respectively, using the BWA MEM-GATK HC DNA-Seq pipeline. Similarly, they achieved a correlation coefficient of r = 0.85 using the HISAT2-StringTie-ballgown and STAR-StringTie-ballgown RNA-Seq pipelines on the human monocyte dataset (SRP082682). Overall, this tool enables easy analyses of omics datasets, significantly ameliorating complex data analysis pipelines.
Schematic diagram of RNA-Seq pipeline
The input, followed by the application of various quality control techniques, alignment to the reference genome, counting the mapped reads, normalization, and differential expression analysis, ultimately generating the TXT/PDF output is detailed in this figure.
Availability – iCOMIC source code is available here https://github.com/RamanLab/iCOMIC. iCOMIC user manual can be accessed using the link, https://icomic-doc.readthedocs.io/en/latest/user-guide.html.