Alternative polyadenylation (APA) has been increasingly recognized as a crucial mechanism that contributes to transcriptome diversity and gene expression regulation. As RNA-seq has become a routine protocol for transcriptome analysis, it is of great interest to leverage such unprecedented collection of RNA-seq data by new computational methods to extract and quantify APA dynamics in these transcriptomes. However, research progress in this area has been relatively limited. Conventional methods rely on either transcript assembly to determine transcript 3’ ends or annotated poly(A) sites. Moreover, they can neither identify more than two poly(A) sites in a gene nor detect dynamic APA site usage considering more than two poly(A) sites.
Xiamen University researchers developed an approach called APAtrap based on the mean squared error model to identify and quantify APA sites from RNA-seq data. APAtrap is capable of identifying novel 3’ UTRs and 3’ UTR extensions, which contributes to locating potential poly(A) sites in previously overlooked regions and improving genome annotations. APAtrap also aims to tally all potential poly(A) sites and detect genes with differential APA site usages between conditions. Extensive comparisons of APAtrap with two other latest methods, ChangePoint and DaPars, using various RNA-seq datasets from simulation studies, human, and Arabidopsis demonstrate the efficacy and flexibility of APAtrap for any organisms with an annotated genome.
Evaluation of APAtrap using RNA-seq data in human
(a) Overlapping of genes with differential APA site usage detected by APAtrap and DaPars or using real polyA-seq data. (b) Dynamic APA events between the Brain and UHR sample detected by APAtrap. (c) As in b, except that DaPars was used. (d) Comparison of APAtrap and DaPars in recovering true poly(A) sites in genes with differential APA site usage. (e) Overlapping of proximal poly(A) sites in genes with differential APA site usage identified by APAtrap and DaPars. (f) Difference of read coverage between upstream 100 bp and downstream 100 bp around proximal poly(A) sites identified by APAtrap and DaPars. The ratio of read coverage of downstream 100 bp to the upstream 100 bp for each poly(A) site was calculated.
Availability – Freely available for download at https://apatrap.sourceforge.io.