CircAST – full-length assembly and quantification of alternatively spliced isoforms in circular RNAs

Circular RNAs (circRNAs), covalently closed continuous RNA loops, are generated from cognate linear RNAs through back splicing events, and alternative splicing events may generate different circRNA isoforms at the same locus. However, the challenges of reconstruction and quantification of alternatively spliced full-length circRNAs remain unresolved.

On the basis of the internal structural characteristics of circRNAs, researchers from the Nanjing University of Aeronautics and Astronautics developed CircAST, a tool to assemble alternatively spliced circRNA transcripts and estimate their expression by using multiple splice graphs. Simulation studies showed that CircAST correctly assembled the full sequences of circRNAs with a sensitivity of 85.63%-94.32% and a precision of 81.96%-87.55%. By assigning reads to specific isoforms, CircAST quantified the expression of circRNA isoforms with correlation coefficients of 0.85-0.99 between theoretical and estimated values. The researchers evaluated CircAST on an in-house mouse testis RNA-seq dataset with RNase R treatment for enriching circRNAs and identified 380 circRNAs with full-length sequences different from those of their corresponding cognate linear RNAs. RT-PCR and Sanger sequencing validated 32 out of 37 randomly selected isoforms, thus further indicating the good performance of CircAST, especially for isoforms with low abundance. They also applied CircAST to published experimental data and observed substantial diversity in circular transcripts across samples, thus suggesting that circRNA expression is highly regulated.

Schematics of CircAST for circular transcript assembly and quantification

rna-seq

A. The flow diagram of CircAST. B. Visualization of the workflow of CircAST. CircAST begins with a set of paired-end RNA-seq reads that have been mapped to the genome. It then constructs multiple splice graphs with different BSJs in a gene locus and assembles the full-length sequences of circular transcripts with EMPC algorithm. CircAST estimates the abundance of each circular isoform assembled above by using an EM algorithm. Finally, all circular transcripts with full-length sequence assembly and abundance estimation are output in the results. SAM, sequence alignment/map; BSJ, back splice junction; EMPC, extended minimum path cover; EM, expectation maximization

Availability – CircAST can be accessed freely at https://github.com/xiaofengsong/CircAST.

Wu J, Li Y, Wang C, Cui Y, Xu T, Wang C, Wang X, Sha J, Jiang B, Wang K, Hu Z, Guo X, Song X. (2020) CircAST: Full-length Assembly and Quantification of Alternatively Spliced Isoforms in Circular RNAs. Genomics Proteomics Bioinformatics [Epub ahead of print]. [article]

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Time limit is exhausted. Please reload CAPTCHA.