Circular RNA (circRNA) is an emerging class of RNA molecules attracting researchers due to its potential for serving as markers for diagnosis, prognosis, or therapeutic targets of cancer, cardiovascular, and autoimmune diseases. Current methods for detection of circRNA from RNA sequencing (RNA-seq) focus mostly on improving mapping quality of reads supporting the back-splicing junction (BSJ) of a circRNA to eliminate false positives (FPs). Researchers at the Karolinska Institutet show that mapping information alone often cannot predict if a BSJ-supporting read is derived from a true circRNA or not, thus increasing the rate of FP circRNAs.
The researchers have developed Circall, a novel circRNA detection method from RNA-seq. Circall controls the FPs using a robust multidimensional local false discovery rate method based on the length and expression of circRNAs. It is computationally highly efficient by using a quasi-mapping algorithm for fast and accurate RNA read alignments. They applied Circall on two simulated datasets and three experimental datasets of human cell-lines. The results show that Circall achieves high sensitivity and precision in the simulated data. In the experimental datasets it performs well against current leading methods. Circall is also substantially faster than the other methods, particularly for large datasets.
Overview of Circall for the discovery of circRNA from RNA-seq
(i) CircRNA candidate detection to discover a list of circRNA candidates. This includes extraction of unmapped reads, collection of BSJ reads, generation of pseudo-sequences of circRNAs and potential tandem RNAs, and filtering to get the list of candidates; (ii) Statistical assessment to rank the candidates. The contour map is an example from the Hela dataset for the statistics from the permutation in the 2d local false discovery rate (2dfdr) method. The red dots and blue triangles indicate the depleted and non-depleted circRNAs, respectively. The circRNAs with fdr2d are marked by grey squares. Details are described in the main text
Availability – Circall is implemented in C++ and R, and available for use at https://www.meb.ki.se/sites/biostatwiki/circall and https://github.com/datngu/Circall.