Unraveling the Transcriptome

by Richard A. Stein, M.D., Ph.D. at Genetic Engineering News

While developments in DNA sequencing provided increasingly high-quality and high-confidence genetic and genomic data, the indispensability of additional layers of inquiry in characterizing biological systems emerged as an acute necessity.

This fueled the interest to characterize and survey the transcriptome of various cell types, under specific conditions.

rna-seqInsights into the transcriptome have been made possible as a result of several waves of technological advances and, among these, RNA-seq is assuming a central and expanding role.

“RNA-seq provides a great tool to start with and, in all certainty, people will develop more precise methods down the road,” says Chuan He, Ph.D., professor of chemistry and the University of Chicago. Dr. He and his colleagues are using RNA-seq, more specifically m6A-seq or MeRIP-seq, to characterize methylation changes on RNA transcripts.

While RNA methylation was observed and reported decades ago, its importance in shaping gene expression has only more recently come into the spotlight, and understanding the removal of methylation marks has been even more elusive.

Investigators in Dr. He’s group were the first to reveal that, just like DNA and histone methylation, RNA methylation is reversible. They found that fat mass and obesity-associated protein, FTO, involved in human obesity and energy homeostasis, is an oxidative demethylase of RNA N6-methyladenosine. Subsequently, Dr. He et al., identified a second RNA demethylase, ALKBH5, which affects mammalian mRNA export and metabolism. While both proteins demethylate N6-methyladenosine RNA residues, they participate in distinct biological pathways and show different tissue expression patterns.

“Once we open this door, there are so many possibilities that emerge because, if we consider all the pathways and networks, RNA modifications can shape and, in some cases, dominate gene regulation,” Dr. He says.

Additional efforts his lab revealed that RNA demethylation is functionally significant and performs a regulatory role. “We justified two critical points, change in gene expression and reversibility but, based on the more stringent definition for epigenetic modifications, we also need to ask whether these changes are heritable, and this aspect needs significantly more work,” he adds.

According to Jia Meng, Ph.D., associate researcher and bioinformatics core facility supervisor at MIT, “not much research has been done on RNA methylation in the past, but recent approaches are enabling us to study the RNA epigenome at an enhanced resolution and at the genome-wide scale.”

Dr. Meng and his colleagues recently developed FRIP-seq (fragmented RNA immunoprecipitation sequencing), a new tool that combines ChIP-seq with RNA-seq. Because of the nature of the RNA, software and algorithms developed for DNA methylation analysis are not informative, and new tools are required.

“This motivated us to develop this new algorithm that will help us, in the long run, to analyze the function of RNA methylation,” says Yufei Huang, Ph.D., professor of electrical and computer engineering at the University of Texas at San Antonio and senior author of the study describing FRIP-seq.

Dr. Huang and his colleagues are currently applying this technology to examine epigenetic changes in the RNA, particularly mRNA, from cancer cell lines.

Based on FRIP-seq, and with the help of computational strategies, a new MATLAB-based package called exomePeak was developed and is freely available for researchers interested in characterizing transcriptome-wide post-transcriptional RNA modifications.

“Over the next few months we will release a new version based on R, and it will be more powerful and user-friendly,” says Dr. Meng.

The existence of RNA methylation in several species—from humans to bacteria—reveals the importance that this process plays in biology. RNA methylation profiling is marked by several challenges, some of which are shared with the ones encountered in the case of DNA, while others are specific for RNA. For example, the presence of 5’-cytosine and 6’-adenine RNA methylation make it technologically more demanding to study this modification than it is in the case of DNA.

“In addition, RNA can be very unstable, and this makes it even more challenging to understand how methylation is introduced into and removed from RNA,” Dr. Huang says.

While the correct alignment of RNA reads to the original genomic sequences is one of the major goals in RNA sequencing, this process may be challenging for multiple reasons. One of them is that the length of RNA reads significantly shapes the effectiveness of reconstructing the transcriptome of the original cell, and shorter reads, though less costly to generate, present a higher risk for misalignment.

“The longer the reads, the higher the likelihood to assign them to the correct location,” says Steven L. Salzberg, Ph.D., professor of medicine, biostatistics, and computer sciences at Johns Hopkins University School of Medicine.

An additional, somewhat related challenge lies in the fact that the human genome contains at least 14,000 pseudogenes. Pseudogenes have highly similar sequences to transcribed genes but, as opposed to them, lack one or several introns, or contain premature stop codons and, as a result, do not encode functional proteins. Nevertheless, intron-spanning RNA reads may align to pseudogenes. A new spliced aligner that Dr. Salzberg and colleagues designed, TopHat2, addressed this and several other concerns.

In a two-step process, TopHat2 first identifies potential intron splice sites, similar to its previous version, TopHat1, and in a second step, it aligns reads that contain multiple exons. Novel algorithms incorporated into TopHat2 allow it to process more diverse sequencing datasets and to align reads of various lengths.

“Overall, TopHat2 aligns more reads, and it does so more accurately than the earlier versions of this algorithm,” Dr. Salzberg says.

The Pseudogene Problem

“We are interested in several aspects related to RNA-seq, as this approach allows us to find coding and noncoding transcripts, examine splicing, and perform quantification,” says Mark B. Gerstein, Ph.D., professor of biomedical informatics at Yale University.

Investigators in Dr. Gerstein’s lab recently performed a complete annotation of pseudogenes from the GENCODE Project data.

While pseudogenes were historically viewed as genomic loci that might not have any roles, some of them were recently proposed to have an active cellular role. By using locus-specific gene expression analyses combined with high-throughput RNA-seq, Dr. Gerstein and colleagues revealed that pseudogene transcription occurs in a tissue-dependent manner and is associated with active promoter regions and open chromatin states. The analysis revealed that even though many pseudogenes are inactive, some of them potentially may assume regulatory functions that are reminiscent of noncoding RNA molecules.

“RNA-seq is something that we will see being increasingly rolled out for multiple applications, including personal transcriptome profiling in cancer and other diseases,” Dr. Gerstein says.

Among the remaining challenges are the need to standardize gene expression measurements, to incorporate the degree to which specific genes are turned on or off, and to advance insights into noncoding RNA—a topic that has seen intense transformation over the past few years.

(read more…)