Numerous methods have been developed to analyse RNA sequencing (RNA-seq) data, but most rely on the availability of a reference genome, making them unsuitable for non-model organisms. Here researchers from the Royal Children’s Hospital, Melbourne present superTranscripts, a substitute for a reference genome, where each gene with multiple transcripts is represented by a single sequence. The Lace software is provided to construct superTranscripts from any set of transcripts, including de novo assemblies. The researchers demonstrate how superTranscripts enable visualisation, variant detection and differential isoform detection in non-model organisms. They further use Lace to combine reference and assembled transcriptomes for chicken and recover hundreds of gaps in the reference genome.
a A gene in the genome (top) and its corresponding transcripts (middle) compared to the superTranscript for the same gene (bottom). Colours indicate superTranscript blocks. b A schematic diagram showing the steps in Lace’s algorithm. A superTranscript (at the bottom) is built from transcripts A and B (blue and red, respectively). For each transcript, Lace builds a directed graph with a node for each base. Transcripts are aligned against one another using blat and the nodes of shared bases are merged. Lace then simplifies the graph by compacting unforked edges. The graph is topologically sorted and the resulting superTranscript annotated with transcripts and blocks. cThe general workflow we propose for RNA-seq analysis in non-model organisms. Reads are de novo assembled, transcripts clustered into genes, superTranscripts assembled using Lace and reads aligned back. Here we use Trinity, Corset and STAR as our assembler, clustering program and aligner, respectively, but equivalent tools could also be used
Availability – The Lace software is available from https://github.com/Oshlack/Lace/wiki