Next Generation Sequencing (NGS) experiments produce millions of short sequences that, mapped to a reference genome, provide biological insights at genomic, transcriptomic and epigenomic level. Typically the amount of reads that correctly maps to the reference genome ranges between 70% and 90%, leaving in some cases a consistent fraction of unmapped sequences. This ‘misalignment’ can be ascribed to low quality bases or sequence differences between the sample reads and the reference genome. Investigating the source of the unmapped reads is definitely important to better assess the quality of the whole experiment and to check for possible downstream or upstream ‘contamination’ from exogenous nucleic acids.
The DecontaMiner pipeline
The tools and the relative functions, input and output file formats are shown. The outputs are grouped in three main directories: ‘Low quality’, ‘Ambiguous’ and ‘Valid’, which collect the result files of each analyzed sample
Availability – The software is freely available at http://www-labgtp.na.icar.cnr.it/decontamine