The advent of high-throughput sequencing technologies made it possible to obtain large volumes of genetic information, quickly and inexpensively. Thus, many efforts are devoted to unveiling the biological roles of genomic elements, being the distinction between protein-coding and long non-coding RNAs one of the most important tasks.
RNAsamba is a tool to predict the coding potential of RNA molecules from sequence information using a neural network that models both the whole sequence and the ORF to identify patterns that distinguish coding from non-coding transcripts. RNAsamba’s classification performance was evaluated using transcripts coming from humans and several other model organisms and show that it recurrently outperforms other state-of-the-art methods.
The results show that RNAsamba can identify coding signals in partial-length ORFs and UTR sequences, evidencing that its algorithm is not dependent on complete transcript sequences. Furthermore, RNAsamba can also predict small ORFs, traditionally identified with ribosome profiling experiments.
RNAsamba will enable faster and more accurate biological findings from genomic data of species that are being sequenced for the first time. A user-friendly web interface, the documentation containing instructions for local installation and usage, and the source code of RNAsamba can be found at https://rnasamba.lge.ibi.unicamp.br/.
Camargo, A.P., Sourkov, V., Pereira, G.A.G. and Carazzolle, M.F. (2020) RNAsamba: neural network-based assessment of the protein-coding potential of RNA sequences. NAR Genomics and Bioinformatics, 2, lqz024.