Analysis of RNA-seq data often detects numerous ‘non-co-linear’ (NCL) transcripts, which comprised sequence segments that are topologically inconsistent with their corresponding DNA sequences in the reference genome. However, detection of NCL transcripts involves two major challenges: removal of false positives arising from alignment artifacts and discrimination between different types of NCL transcripts (trans-spliced, circular or fusion transcripts).
Here, researchers from the Genomics Research Center, Taipei have developed a new NCL-transcript-detecting method (‘NCLscan’), which utilized a stepwise alignment strategy to almost completely eliminate false calls (>98% precision) without sacrificing true positives, enabling NCLscan outperform 18 other publicly-available tools (including fusion- and circular-RNA-detecting tools) in terms of sensitivity and precision, regardless of the generation strategy of simulated dataset, type of intragenic or intergenic NCL event, read depth of coverage, read length or expression level of NCL transcript.
Identification of NCL transcripts. (A) Flowchart depicting the NCLscan pipeline. (B and C) Schematic illustrations of possible ‘putative NCL references’ with putative NCL junction sites (based on BLAT alignment output and GENCODE annotation): (B) intragenic case and (C) intergenic case. The characters ‘e’ and ‘X’ represent ‘exon’ and the putative NCL reference that is not considered, respectively. (D) Schematic illustration of a retained putative NCL reference, which satisfies all of the criteria listed in (A).
With the high accuracy, NCLscan was applied to distinguishing between trans-spliced, circular and fusion transcripts on the basis of poly(A)- and nonpoly(A)-selected RNA-seq data. The researchers showed that circular RNAs were expressed more ubiquitously, more abundantly and less cell type-specifically than trans-spliced and fusion transcripts. This study thus describes a robust pipeline for the discovery of NCL transcripts, and sheds light on the fundamental biology of these non-canonical RNA events in human transcriptome.
Availability – The NCLscan program, document and test dataset are publicly accessible from GitHub at https://github.com/TreesLab/NCLscan or our FTP site at ftp://treeslab1.genomics.sinica.edu.tw/NCLscan.