SOLiD DNA sequences are typically analyzed using a reference genome, while they are not recommended for de novo assembly of genomes or transcriptomes. This is mainly due to the difficulty in translating the SOLiD color-space data into normal base-space sequences. In fact, the nature of color-space is such that any misinterpreted color leads to a chain of further translation errors, producing totally wrong results. Here researchers from the Università di Padova describe SATRAP, a computer program designed to efficiently translate de novo assembled color-space sequences into a base-space format. The program was tested and validated using simulated and real transcriptomic data; its modularity allows an easy integration into more complex pipelines, such as Oases for RNA-seq de novo assembly.
Flowchart of the color-translation process – Step1: the first base (FTB) of each read can be translated from color-space with high accuracy; for each read the FTB is mapped on the contig. Step 2: check color coherence with neighboring FTBs; three conditions can be detected: a) FTBs coherent with their neighboring FTBs on both sides (such as the ‘A’ at the centre of the figure); FTB coherent only on one side (such as the ‘G’ that is coherent with the ‘A’, but not with the ‘C’); FTBs with no coherence on both sides (such as the ‘A’ circled in red). The latter are removed from the assembly. Step 3 and 4: find regions delimited by two reliable start sites and translate color-space into base-space. Any remaining regions will be incoherent in terms of color compatibility. To resolve these regions the threshold for color reliability is calculated (Step 5) and the resulting value is used to establish the critical regions of the contig (Step 6).
Availability – SATRAP is available at http://satrap.cribi.unipd.it, either as a multi-step pipeline incorporating several tools for RNA-seq assembly or as an individual module for use with the Oases package.