FLAIR2 – detecting haplotype-specific transcript variation in long reads

RNA sequencing (RNA-seq) has transformed our ability to study RNA, the molecule that plays a crucial role in translating genetic information from DNA into proteins. This technology has unveiled numerous abnormalities in RNA processing, which are linked to various diseases. Specifically, changes in RNA splicing (how RNA is cut and rearranged) and single nucleotide variants (SNVs, which are small changes in the RNA sequence) can impact the stability, location, and function of RNA transcripts.

One key player in RNA editing is an enzyme called ADAR, which changes adenosine in RNA to inosine. High levels of ADAR have been associated with increased aggressiveness in lung cancer cells and have been linked to changes in RNA splicing. Despite the importance of studying these RNA variations, traditional RNA-seq methods that use short reads (small pieces of RNA) have struggled to analyze both splicing and SNVs simultaneously.

Advancing with Long-Read Sequencing

To address this challenge, researchers at the University of California, Santa Cruz have turned to long-read sequencing technology. This method provides full-length RNA sequences, offering a more comprehensive view of how RNA variants affect splicing. The team developed a computational tool that enhances FLAIR, a program used to model different RNA isoforms (versions of RNA molecules) from long-read data. This improved tool integrates information about RNA variants with the corresponding RNA isoforms.

Variant-aware transcript detection by FLAIR2
identifies haplotype-specific transcript isoform bias

Fig. 1

a Full FLAIR2 computational workflow for identifying haplotype-specific transcripts in long reads. For annotated transcript discovery, long reads are aligned to annotated transcript sequences and inspected for their overall match and read support at annotated splice junctions and transcript ends. The genomic alignments for reads that are not assigned to an annotated transcript are corrected and collapsed for unannotated isoform discovery. User-provided unphased/phased RNA variant calls can be associated with reads using FLAIR2; last, FLAIR2 counts the number of variant sets comprised by the reads assigned to each transcript model to determine variant-aware transcripts. Red ticks indicate mismatches; purple stars indicate RNA variants. b FLAIR transcript models for Mcm5 with the highest expression are plotted using different colors for each transcript’s exons. The highlighted portion shows alternative splicing and the smaller blocks within exons indicate variants. c Stacked bar chart showing the proportion of transcript expression of transcripts from b as matched by color for each of the replicates sequenced

Using nanopore sequencing, a type of long-read technology, researchers studied RNA from H1975 lung adenocarcinoma cells. They compared cells with normal ADAR levels to those with reduced ADAR activity (achieved through a technique called knockdown). This allowed them to identify specific RNA isoforms associated with inosine, helping to clarify ADAR’s role in cancer.

Key Findings

  1. Full-Length RNA Sequences: Long-read sequencing allowed researchers to obtain complete RNA sequences, essential for understanding how RNA variants influence splicing.
  2. Enhanced Analysis with FLAIR: By upgrading the FLAIR tool, the team could seamlessly integrate RNA variant data with specific RNA isoforms, providing a clearer picture of RNA diversity.
  3. Role of ADAR: The study identified key RNA isoforms linked to inosine, highlighting ADAR’s significant role in lung cancer progression.

Conclusions

The study demonstrates that long-read sequencing is a powerful tool for investigating the complex relationship between RNA variants and splicing patterns. This technology provides a detailed view of RNA molecules, offering valuable insights into the molecular mechanisms underlying diseases like lung cancer. Such knowledge can lead to new diagnostic tools and treatments targeting RNA processing abnormalities.

In summary, the adoption of long-read sequencing technology marks a significant advancement in RNA research, offering new perspectives on the intricate world of RNA and its implications for health and disease. This approach holds great promise for improving our understanding of cancer and potentially other diseases linked to RNA processing errors.

Tang AD, Felton C, Hrabeta-Robinson E, Volden R, Vollmers C, Brooks AN. (2024) Detecting haplotype-specific transcript variation in long reads with FLAIR2. Genome Biol 25(1):173. [article]

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Time limit is exhausted. Please reload CAPTCHA.