Researchers from Columbia University have develop a method named mRIN to directly assess mRNA integrity from RNA-Seq data at the sample and individual gene level. They systematically analysed large-scale RNA-Seq data sets of the human brain transcriptome generated by different consortia. Their analysis demonstrates that 3′ bias resulting from partial RNA fragmentation in post-mortem tissues has a marked impact on global expression profiles, and that mRIN effectively identifies samples with different levels of mRNA degradation. Unexpectedly, this process has a reproducible and gene-specific component, and transcripts with different stabilities are associated with distinct functions and structural features reminiscent of mRNA decay in living cells.
(a) Schematic illustration of the algorithm to estimate mRIN. After estimation of the 3′ bias of each gene and sample using a KS statistic from the read coverage profile, an mRIN is calculated for each sample. A normal distribution of the mRINs of non-degraded samples is estimated using a mixture model to assess the statistical significance. (b) Global under-representation of gene expression of the BrainSpan samples as measured by RNA-Seq is associated with low mRINs. Samples in the mRIN bar plot and the heat map are in the same order. (c) Validation of mRIN as a measure of mRNA integrity by a direct comparison of the RNA-Seq and exon array data. This analysis included 479 samples whose gene expression was quantified by both RNA-Seq and exon arrays. For each sample, the correlation of gene expression estimated from RNA-Seq and that estimated from exon arrays (denoted seq–array correlation or SAC) is calculated. SAC is plotted against the mRIN of each sample (Pearson correlation R=0.58, P<2.2 × 10−16, F-test). (d) mRIN was used to separate 124 samples with the most severe RNA degradation (mRIN<−0.033, P<0.1, Methods) from the remaining 355 samples. For each group, the heat maps of gene expression as measured by RNA-Seq and exon arrays are shown, with genes and samples in the same order as determined by hierarchical clustering of the array data.
Availability – A set of perl and R scripts were implemented to calculate the exonic read coverage, cumulative distribution and KS statistics, mRINs and the statistical significance. The software package and the documentation are available through http://zhanglab.c2b2.columbia.edu/index.php/mRIN. Hierarchical clustering was performed using cluster (http://bonsai.hgc.jp/~mdehoon/software/cluster/) and java treeview (http://jtreeview.sourceforge.net).