Visualisation of the transcriptome relative to a reference genome is fraught with sparsity. This is due to RNA sequencing (RNA-Seq) reads being predominantly mapped to exons that account for just under 3% of the human genome. Recently, researchers at the Peter MacCallum Cancer Centre have used exon-only references, superTranscripts, to improve visualisation of aligned RNA-Seq data through the omission of supposedly unexpressed regions such as introns. However, variation within these regions can lead to novel splicing events that may drive a pathogenic phenotype. In these cases, the loss of information in only retaining annotated exons presents significant drawbacks.
Here the reseachers present Slinker, a bioinformatics pipeline written in Python and Bpipe that uses a data-driven approach to assemble sample-specific superTranscripts. At its core, Slinker uses Stringtie2 to assemble transcripts with any sequence across any gene. This assembly is merged with reference transcripts, converted to a superTranscript, of which rich visualisations are made through Plotly with associated annotation and coverage information. Slinker was validated on five novel splicing events of rare disease samples from a cohort of primary muscular disorders. In addition, Slinker was shown to be effective in visualising deletion events within transcriptomes of tumour samples in the important leukemia gene, IKZF1. Slinker offers a succinct visualisation of RNA-Seq alignments across typically sparse regions.
A schematic of the Slinker pipeline for a single user-input gene
Reference transcripts are first filtered by Transcript Support Level (TSL) as indicated in the reference annotation. The TSL is a tag included within Ensembl annotation that informs on the size and quality of evidence supporting the existence of the associated transcript. These transcripts, along with the input BAM files, are used to perform genome-guided assembly and output the predicted transcripts for each control and the case sample. The merged result of these assemblies is then flattened, exons concatenated, and oriented in the transcriptionally forwards direction to create the genome-guided superTranscripts. The reads previously mapped to the user-supplied gene are extracted from each input BAM and then re-aligned to this new reference. Finally, the coverage from each alignment, splice junctions, and transcript annotation are combined into both interactive and static visualisations upon completion of the pipeline.
Availability – Slinker is freely available on Github: https://github.com/Oshlack/Slinker