mRIN – direct assessment of genome-wide and gene-specific mRNA integrity from large-scale RNA-sequencing data

The volume of RNA-Seq data sets in public repositories has been expanding exponentially, providing unprecedented opportunities to study gene expression regulation. Because degraded RNA samples, such as those collected from post-mortem tissues, can result in distinct expression profiles with potential biases, a particularly important step in mining these data is quality control.

Researchers from Columbia University have develop a method named mRIN to directly assess mRNA integrity from RNA-Seq data at the sample and individual gene level. They systematically analysed large-scale RNA-Seq data sets of the human brain transcriptome generated by different consortia. Their analysis demonstrates that 3′ bias resulting from partial RNA fragmentation in post-mortem tissues has a marked impact on global expression profiles, and that mRIN effectively identifies samples with different levels of mRNA degradation. Unexpectedly, this process has a reproducible and gene-specific component, and transcripts with different stabilities are associated with distinct functions and structural features reminiscent of mRNA decay in living cells.

rna-seq

(a) Schematic illustration of the algorithm to estimate mRIN. After estimation of the 3′ bias of each gene and sample using a KS statistic from the read coverage profile, an mRIN is calculated for each sample. A normal distribution of the mRINs of non-degraded samples is estimated using a mixture model to assess the statistical significance. (b) Global under-representation of gene expression of the BrainSpan samples as measured by RNA-Seq is associated with low mRINs. Samples in the mRIN bar plot and the heat map are in the same order. (c) Validation of mRIN as a measure of mRNA integrity by a direct comparison of the RNA-Seq and exon array data. This analysis included 479 samples whose gene expression was quantified by both RNA-Seq and exon arrays. For each sample, the correlation of gene expression estimated from RNA-Seq and that estimated from exon arrays (denoted seq–array correlation or SAC) is calculated. SAC is plotted against the mRIN of each sample (Pearson correlation R=0.58, P<2.2 × 10−16, F-test). (d) mRIN was used to separate 124 samples with the most severe RNA degradation (mRIN<−0.033, P<0.1, Methods) from the remaining 355 samples. For each group, the heat maps of gene expression as measured by RNA-Seq and exon arrays are shown, with genes and samples in the same order as determined by hierarchical clustering of the array data.

Availability – A set of perl and R scripts were implemented to calculate the exonic read coverage, cumulative distribution and KS statistics, mRINs and the statistical significance. The software package and the documentation are available through http://zhanglab.c2b2.columbia.edu/index.php/mRIN. Hierarchical clustering was performed using cluster (http://bonsai.hgc.jp/~mdehoon/software/cluster/) and java treeview (http://jtreeview.sourceforge.net).

Feng H, Zhang X, Zhang C. (2015) mRIN for direct assessment of genome-wide and gene-specific mRNA integrity from large-scale RNA-sequencing data. Nat Commun 6:7816. [article]

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Time limit is exhausted. Please reload CAPTCHA.