Aptardi predicts polyadenylation sites in sample-specific transcriptomes using high-throughput RNA sequencing and DNA sequence

Annotation of polyadenylation sites from short-read RNA sequencing alone is a challenging computational task. Other algorithms rooted in DNA sequence predict potential polyadenylation sites; however, in vivo expression of a particular site varies based on a myriad of conditions. Researchers from the University of Colorado Anschutz Medical Campus introduce aptardi (alternative polyadenylation transcriptome analysis from RNA-Seq data and DNA sequence information), which leverages both DNA sequence and RNA sequencing in a machine learning paradigm to predict expressed polyadenylation sites. Specifically, as input aptardi takes DNA nucleotide sequence, genome-aligned RNA-Seq data, and an initial transcriptome. The program evaluates these initial transcripts to identify expressed polyadenylation sites in the biological sample and refines transcript 3′-ends accordingly. The average precision of the aptardi model is twice that of a standard transcriptome assembler. In particular, the recall of the aptardi model (the proportion of true polyadenylation sites detected by the algorithm) is improved by over three-fold. Also, the model—trained using the Human Brain Reference RNA commercial standard—performs well when applied to RNA-sequencing samples from different tissues and different mammalian species. Finally, aptardi’s input is simple to compile and its output is easily amenable to downstream analyses such as quantitation and differential expression.

Overview for using aptardi

Fig. 1

Aptardi requires three files as input: (1) FASTA file of DNA sequence with headers by chromosome, (2) sorted Binary Alignment Map (BAM) file of reads aligned to the genome, and (3) General Feature Format (GTF) file of transcript structures. Blue boxes represent software. Yellow writing/boxes indicate aptardi incorporation. Note transcript structures can be derived from a reference transcriptome (i.e., Ensembl annotation) in lieu of the original transcriptome generated from a transcriptome assembler.

Availability – the aptardi prediction model is available at: https://github.com/luskry/aptardi

Lusk R, Stene E, Banaei-Kashani F, Tabakoff B, Kechris K, Saba LM. (2021) Aptardi predicts polyadenylation sites in sample-specific transcriptomes using high-throughput RNA sequencing and DNA sequence. Nat Commun 12(1):1652. [article]

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Time limit is exhausted. Please reload CAPTCHA.