DNA sequencing technology is becoming more accessible to a variety of researchers as costs continue to decline. As researchers begin to sequence novel transcriptomes, most of these datasets lack a reference genome and will have to rely on de novo assemblers. Making comparisons across assemblies can be difficult: each program has its strengths and weaknesses and no tool exists to comparatively evaluate these datasets.
Now, a team led by researchers at the University of Rhode Island have developed software in R, called Sequence Comparative Analysis using Networks (SCAN) to perform statistical comparisons between distinct assemblies. SCAN uses a reference dataset to identify the most accurate de novo assembly and the ‘good’ transcripts in the user’s data. They tested SCAN on 3 publicly available transcriptomes, each assembled using 3 assembly programs. Moreover, they sequenced the transcriptome of the oomycete Achlya hypogyna and compared de novo assemblies from Velvet, ABySS, and the CLC Genomics Workbench assembly algorithms. One thousand one hundred and twenty eight (1,128) of the CLC transcripts were statistically similar to the reference, compared to 49 of the Velvet transcripts and 937 of the ABySS transcripts. SCAN’s strength is providing statistical support for transcript assemblies in a biological context. However, SCAN is designed to compare distinct node sets in networks, therefore it can also easily be extended to perform statistical comparisons on any network graph regardless of what the nodes represent.
Availability – Two versions of SCAN were developed: “SCAN” and “SCAN stringent,” that can run either in single or multiprocessor nodes, and are available from http://evol-net.fr .
- Misner I, Bicep C, Lopez P, Halary S, Bapteste E, Lane CE. (2013) Sequence Comparative Analysis using Networks (SCAN): software for evaluating de novo transcript assembly from next generation sequencing. Mol Biol Evol [Epub ahead of print]. [abstract]