RNA-seq has become a popular technology for studying genetic variation of pre-mRNA alternative splicing. Commonly used RNA-seq aligners rely on the consensus splice site dinucleotide motifs to map reads across splice junctions. Consequently, genomic variants that create novel splice site dinucleotides may produce splice junction RNA-seq reads that cannot be mapped to the reference genome.
Researchers at UCLA have developed and evaluated an approach to identify ‘hidden’ splicing variations in personal transcriptomes, by mapping personal RNA-seq data to personal genomes. Computational analysis and experimental validation indicate that this approach identifies personal specific splice junctions at a low false positive rate. Applying this approach to an RNA-seq data set of 75 individuals, they identified 506 personal specific splice junctions, among which 437 were novel splice junctions not documented in current human transcript annotations. 94 splice junctions had splice site SNPs associated with GWAS signals of human traits and diseases. These involve genes whose splicing variations have been implicated in diseases (such as OAS1), as well as novel associations between alternative splicing and diseases (such as ICA1).
Identifying hidden splice junctions by aligning personal RNA-seq reads to personal genomes. (A) RNA-seq splice junction reads originating from SNPs creating personal splice site dinucleotide motifs (shown in red) do not align to the reference genome due to non-canonical splice site motifs in the reference genome. The RNA-seq splice junction reads do, however, align to the personal genome. (B) Flowchart of the rPGA pipeline.
Collectively, this work demonstrates that the personal genome approach to RNA-seq read alignment enables the discovery of a large but previously unknown catalog of splicing variations in human populations.
Availability – The rPGA source code and user documents are freely available for download at https://github.com/Xinglab/rPGA.