by Kelly Rae Chi – Biotechniques

Five RNA-seq library preparation methods go head-to-head in terms of performance in low-quality and low quantity RNA samples. Which method came out on top? 

In a side-by-side comparison of five different transciptome sequencing (RNA-seq) library preparation methods, the RNase H technique outperformed others for analysis of low-quality RNA samples and was among the least expensive; SMART and NuGEN methods worked well for small amounts of RNA. The findings are published online May 19, 2013 in Nature Methods (1).

“We’d been looking for a way to deal with the issue of degraded RNA for a while. We tried different methods, and finally we found this RNase H method that works well and gives consistent results,” said co-author Xian Adiconis, senior research associate at the Broad Institute of Harvard and the Massachusetts Institute of Technology (MIT) in Cambridge, MA.

A strategy for analyzing the transcriptome, RNA-seq uses high-throughput sequencing to sequence and quantify RNA. But results suffer when researchers use samples that are degraded or present in small amounts. Numerous methods and commercial kits are available for the analysis of low quality and low quantity samples, but until now, the choice relied somewhat on guesswork, said Adiconis. Read more

The surprising observation that virtually the entire human genome is transcribed means we know little about the function of many emerging classes of RNAs, except their astounding diversities. Traditional RNA function prediction methods rely on sequence or alignment information, which are limited in their abilities to classify the various collections of non-coding RNAs (ncRNAs). To address this, researchers from the University of Pennsylvania developed Classification of RNAs by Analysis of Length (CoRAL), a machine learning-based approach for classification of RNA molecules. CoRAL uses biologically interpretable features including fragment length and cleavage specificity to distinguish between different ncRNA populations. They evaluated CoRAL using genome-wide small RNA sequencing data sets from four human tissue types and were able to classify six different types of RNAs with ∼80% cross-validation accuracy. Analysis by CoRAL revealed that microRNAs, small nucleolar and transposon-derived RNAs are highly discernible and consistent across all human tissue types assessed, whereas long intergenic ncRNAs, small cytoplasmic RNAs and small nuclear RNAs show less consistent patterns. The ability to reliably annotate loci across tissue types demonstrates the potential of CoRAL to characterize ncRNAs using small RNA sequencing data in less well-characterized organisms.

RNA-Seq

Availability – The CoRAL source code, required genome annotation files, and prediction results are available at http://wanglab.pcbi.upenn.edu/coral.

  • Leung YY, Ryvkin P, Ungar LH, Gregory BD, Wang LS. (2013) CoRAL: predicting non-coding RNAs from small RNA-sequencing data. Nucleic Acids Res [Epub ahead of print]. [article]

It has been shown in small RNA sequencing-based studies that some small RNA fragments are specifically processed from known structural non-coding RNAs, either through Dicer-dependent or Dicer-independent pathways. Although these small RNAs are often less abundant compared to microRNAs in normal mammalian tissues, they are always present in all sequenced libraries. In this paper, researchers from the Institut Curie, France use the ncPRO-seq pipeline, to describe different profiles of these small RNA fragments, and to discuss their potential processing pathways and functions. To assess whether more small RNA fragments can be detected in small RNA sequencing datasets, they decided to focus on small nuclear RNAs, abbreviated as snRNAs, which are associated with Sm ribonucleoproteins to form functional RNA-protein complexes. Here, they describe a group of small RNA fragments derived from snRNAs, which are typically highly enriched in regions bound by Sm proteins. Based on this, they propose the existence of a potential novel small RNA family associated with Sm proteins.

RNA-Seq

  • Chen CJ, Heard E. (2013) Small RNAs derived from structural non-coding RNAs. Methods [Epub ahead of print]. [astract]

Recent molecular studies have shown that, even when derived from a seemingly homogenous population, individual cells can exhibit substantial differences in gene expression, protein levels and phenotypic output, with important functional consequences. Existing studies of cellular heterogeneity, however, have typically measured only a few pre-selected RNAs or proteins simultaneously, because genomic profiling methods could not be applied to single cells until very recently.

Here, a team led by researchers at Harvard University used single-cell RNA sequencing to investigate heterogeneity in the response of mouse bone-marrow-derived dendritic cells (BMDCs) to lipopolysaccharide. They found extensive, and previously unobserved, bimodal variation in messenger RNA abundance and splicing patterns, which they validated by RNA-fluorescence in situ hybridization for select transcripts. In particular, hundreds of key immune genes are bimodally expressed across cells, surprisingly even for genes that are very highly expressed at the population average. Moreover, splicing patterns demonstrate previously unobserved levels of heterogeneity between cells. Some of the observed bimodality can be attributed to closely related, yet distinct, known maturity states of BMDCs; other portions reflect differences in the usage of key regulatory circuits. For example, they identified a module of 137 highly variable, yet co-regulated, antiviral response genes. Using cells from knockout mice, the researchers show that variability in this module may be propagated through an interferon feedback circuit, involving the transcriptional regulators Stat2 and Irf7. This study demonstrates the power and promise of single-cell genomics in uncovering functional diversity between cells and in deciphering cell states and circuits.

RNA-Seq

  • Shalek AK, Satija R, Adiconis X, Gertner RS, Gaublomme JT, Raychowdhury R, Schwartz S, Yosef N, Malboeuf C, Lu D, Trombetta JT, Gennert D, Gnirke A, Goren A, Hacohen N, Levin JZ, Park H, Regev A. (2013) Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells. Nature [Epub ahead of print]. [abstract]

Graphite web is a novel web tool for pathway analyses and network visualization for gene expression data of both microarray and RNA-seq experiments. Several pathway analyses have been proposed either in the univariate or in the global and multivariate context to tackle the complexity and the interpretation of expression results. These methods can be further divided into ‘topological’ and ‘non-topological’ methods according to their ability to gain power from pathway topology. Biological pathways are, in fact, not only gene lists but can be represented through a network where genes and connections are, respectively, nodes and edges. To this day, the most used approaches are non-topological and univariate although they miss the relationship among genes. On the contrary, topological and multivariate approaches are more powerful, but difficult to be used by researchers without bioinformatic skills.

Here, researchers from the University of Padova, Italy present Graphite web, the first public web server for pathway analysis on gene expression data that combines topological and multivariate pathway analyses with an efficient system of interactive network visualizations for easy results interpretation. Specifically, Graphite web implements five different gene set analyses on three model organisms and two pathway databases.

RNA-Seq

Availability – Graphite Web is freely available at http://graphiteweb.bio.unipd.it/.

Sales G, Calura E, Martini P, Romualdi C. (2013) Graphite Web: web tool for gene set analysis exploiting pathway topology. Nucleic Acids Res [Epub ahead of print]. [article]

As RNA-seq is replacing gene expression microarrays to assess genome-wide transcription abundance, gene expression Quantitative Trait Locus (eQTL) studies using RNA-seq have emerged. RNA-seq delivers two novel features that are important for eQTL studies. First, it provides information on allele-specific expression (ASE), which is not available from gene expression microarrays. Second, it generates unprecedentedly rich data to study RNA-isoform expression. In this paper, the authors review current methods for eQTL mapping using ASE and discuss some future directions. They also review existing works that use RNA-seq data to study RNA-isoform expression and we discuss the gaps between these works and isoform-specific eQTL mapping.

RNA-Seq

  • Sun W, Hu Y. (2013) eQTL Mapping Using RNA-seq Data. Stat Biosci 5(1), 198-219. [article]

Voom: variance modelling at the observation-level

In the past few years, RNA-seq has emerged as a revolutionary new technology for expression profiling. RNA-seq expression data consists of read counts, and many recent publications have argued therefore that RNA-seq data should be analysed by statistical methods designed specifically for counts. Yet all the statistical methods developed for RNA-seq counts rely on approximations of various kinds.

VoomThis article revisits the idea of applying normal-based microarray-like statistical methods to RNA-seq read counts, with the idea that it is more important to model the mean-variance relationship correctly than it is to specify the exact probabilistic distribution of the counts. Log-counts per million are used as expression values. The voom method estimates the mean-variance relationship robustly and generates a precision weight for each individual normalized observation. The normalized log-counts per million and associated precision weights are then entered into the limma analysis pipeline, or indeed into any statistical pipeline for microarray data that is precision weight aware. This opens access for RNA-seq analysts to a large body of methodology developed for microarrays, allowing RNA-seq and microarray data to be analysed in closely comparable ways. The performance of voom and related limma-based pipelines is compared to that of edgeR, DESeq, baySeq, TSPM, PoissonSeq, and DSS. Simulation studies show that voom out-performs previous RNA-seq methods even when the data is generated according to the assumptions of the earlier methods. This is especially true when the sequence depths vary between RNA samples. Several data sets are also analysed to demonstrate how voom can handle heterogeneous data and complex experiments as well as facilitating pathway analysis and gene set testing methods.

(read more…)

Cirsium arvenseHybridization is a prominent process among natural plant populations that can result in phenotypic novelty, heterosis, and changes in gene expression. The effects of intraspecific hybridization on F1 hybrid gene expression were investigated using parents from divergent, natural populations of Cirsium arvense, an invasive Compositae weed.

Using an RNA-seq approach, the expression of 68,746 unigenes was quantified in parents and hybrids. The expression levels of 51% of transcripts differed between parents, a majority of which had <1.25x fold-changes. More unigenes had higher expression in the invasive parent (P1) than the non-invasive parent (P2). Of those that were divergently expressed between parents, 10% showed additive and 81% showed non-additive (transgressive or dominant) modes of gene action in the hybrids. A majority of the dominant cases had P2-like expression patterns in the hybrids. Comparisons of allele-specific expression also enabled a survey of cis- and trans-regulatory effects. Cis- and trans-regulatory divergence was found at 70% and 68% of 62,281 informative SNP sites, respectively. Of the 17% of sites exhibiting both cis- and trans- effects, a majority (70%) had antagonistic regulatory interactions (cis x trans); trans-divergence tended to drive higher expression of the P1 allele whereas cis-divergence tended to increase P2 transcript abundance. Trans-effects correlated more highly than cis- with parental expression divergence and accounted for a greater proportion of the regulatory divergence at sites with additive compared to non-additive inheritance patterns. This study explores the nature of, and types of mechanisms underlying, expression changes that occur in upon intraspecific hybridization in natural populations.

  • Bell GD, Kane NC, Rieseberg LH, Adams KL. (2013) RNA-seq analysis of allele-specific expression, hybrid effects, and regulatory divergence in hybrids compared with their parents from natural populations. Genome Biol Evol [Epub ahead of print]. [abstract]

With the availability of gene expression data by RNA-seq, powerful statistical approaches for grouping similar gene expression profiles across different environments have become increasingly important. A team led by researchers at Penn State University describe and assess a computational model for clustering genes into distinct groups based on the pattern of gene expression in response to changing environment. The model capitalizes on the Poisson distribution to capture the count property of RNA-seq data. A two-stage hierarchical expectation-maximization (EM) algorithm is implemented to estimate an optimal number of groups and mean expression amounts of each group across two environments. A procedure is formulated to test whether and how a given group shows a plastic response to environmental changes. The impact of gene-environment interactions on the phenotypic plasticity of the organism can also be visualized and characterized. The model was used to analyse an RNA-seq dataset measured from two cell lines of breast cancer that respond differently to an anti-cancer drug, from which genes associated with the resistance and sensitivity of the cell lines are identified. They performed simulation studies to validate the statistical behaviour of the model. The model provides a useful tool for clustering gene expression data by RNA-seq, facilitating understanding of gene functions and networks.

rna-seq

  • Wang N, Wang Y, Hao H, Wang L, Wang Z, Wang J, Wu R. (2013) A bi-Poisson model for clustering gene expression profiles by RNA-seq. Brief Bioinform [Epub ahead of print]. [abstract]

RNA-seq has shown huge potential for phylogenomic inferences in non-model organisms. However, error, incompleteness, and redundant assembled transcripts for each gene in de novo assembly of short reads cause noise in analyses and a large amount of missing data in the aligned matrix. To address these problems, we compare de novo assemblies of paired end 90 bp RNA-seq reads using Oases, Trinity, Trans-ABySS and SOAPdenovo-Trans to transcripts from genome annotation of the model plant Ricinus communis. By doing so we evaluate strategies for optimizing total gene coverage and minimizing assembly chimeras and redundancy.

Researchers at the University of Michigan found that the frequency and structure of chimeras vary dramatically among different software packages. The differences were largely due to the number of trans-self chimeras that contain repeats in the opposite direction. More than half of the total chimeras in Oases and Trinity were trans-self chimeras. Within each package, they found a trade-off between maximizing reference coverage and minimizing redundancy and chimera rate.

In order to reduce redundancy, they investigated three methods: Read more

Maarten Leerkes speaks to Izzy Scott Moncrieff in the run up to the  upcoming RNA-Seq 2013 Summit, 18th–20th June 2013, Boston, MA

Maarten Leerkes provides bioinformatics support for various projects at NIAID as an employee of Medical Science and Computing, Inc. His areas of research include the use of bioinformatics to interpret sequencing data and to find patterns that can be extrapolated into diagnostic tools for improving treatment options for patients.

What initially attracted you to RNA-Seq?

During my doctoral training, I worked with open reading frame ESTs and SAGE platforms that aimed towards similar end products as RNA-Seq. The difference being that RNA-Seq has more sequencing depth and its information content is much higher.

A prevailing theme during my Ph.D. was a need for innovative technologies to process extra volumes of data and to answer certain types of research questions that were limited with existing technologies. With the development of RNA-Seq, many challenges to answering some of those questions were overcome. My interest, for example, was to answer questions related to alternative splicing and translocations that lead to fusions between gene products. The increased resolution of RNA Seq really opened up possibilities for me as I pursued my research.

Download a PDF of the entire interview at – http://rna-seqsummit.com/library

Many species have evolved into diverse strains with phenotypic and genotypic variations that facilitate adaptation to different ecological niches and, in the case of pathogens, to different hosts. Whereas comparison of genome sequences reveals differences and similarities among strains, the consequences of genomic variations can be tracked by studying the functional output from the genome. RNA sequencing has been revolutionizing transcriptome analyses of both pro- and eukaryotes. However, the bioinformatics-based analysis is still lagging behind, and transcriptome features are often manually annotated, which is laborious and time-consuming. This is even more compounded for the analyses of multiple strains.

Here, a team led by researchers at the University of Würzburg and the University of Tübingen, Germany compared the primary transcriptomes of four isolates of Campylobacter jejuni, the leading cause of bacterial gastroenteritis in humans, and provide genome-wide transcriptional start site (TSS) maps using a novel automated annotation method. Their comparative RNA–seq showed that most TSS are conserved in multiple strains, but they also observed SNP–dependent promoter usage. Furthermore, the researchers identified a novel minimal RNA–based CRISPR immune system as well as strain-specific small RNA repertoires. This automated, comparative TSS annotation will facilitate and improve transcriptome annotation for a wider range of organisms and provides insights into the contribution of transcriptome differences to phenotypic variation among closely related species.

RNA-Seq

  • Dugar G, Herbig A, Förstner KU, Heidrich N, Reinhardt R, et al. (2013) High-Resolution Transcriptome Maps Reveal Strain-Specific Regulatory Features of Multiple Campylobacter jejuni Isolates. PLoS Genet 9(5), e1003495. [article]

The estimation of isoform abundances from RNA-Seq data requires a time-intensive step of mapping reads to either an assembled, or previously annotated transcriptome, followed by an optimization procedure for deconvolution of multi-mapping reads. These procedures are essential for downstream analysis such as differential expression. In cases where it is desirable to adjust the underlying annotation, for example upon the discovery of novel isoforms or errors in existing annotations, current pipelines must be rerun from scratch. This makes it difficult to update abundance estimates after re-annotation, or to explore the effect of changes in the transcriptome on analyses.

Researchers at UC Berkeley have developed a novel efficient algorithm for updating abundance estimates from RNA-Seq experiments upon re-annotation that does not require re-analysis of the entire dataset. Their approach is based on a fast partitioning algorithm for identifying transcripts whose abundances may depend on the added or deleted isoforms, and on a fast follow-up approach to re-estimating abundances for all transcripts. They demonstrate the effectiveness of our methods by showing how to synchronize RNA-Seq abundance estimates with the daily RefSeq incremental updates. Thus, they provide a practical approach to maintaining relevant databases of RNA-Seq derived abundance estimates even as annotations are being constantly revised.

ReXpress

Availability – ReXpress is freely available, together with source code, at http://bio.math.berkeley.edu/ReXpress/

Contact: lpachter@math.berkeley.edu

  • Roberts A, Schaeffer L, Pachter L. (2013) Updating RNA-Seq analyses after re-annotation. Bioinformatics [Epub ahead of print]. [abstract]

Next Page →

  • Social Networking Pages

    Linkedin Group

  • Follow Me on Pinterest
  • RSS SEQanswers – RNA Sequencing

    • DESeq; can I omit timepoints during dispersal estimation? May 24, 2013
      I have a bacterial timecourse with 2 biological replicates per timepoint. There is a fair bit of variance between my replicates. I have spent the... […]
      amcloon
    • HT Seq Count stranded options May 24, 2013
      I am very new to bioinformatics, so I would be really grateful for some help! I have been using *HTSeq Count v0.5.3* and I am bit confused about... […]
      qwrissie
    • Tophat 2.0.8b installation error May 24, 2013
      I install tophat-2.0.8b to rerun the mapping. but when i make it, the error appears like this. make[1]: Entering directory... […]
      canhu
    • reason for low mapping rate?? May 23, 2013
      we did RNASeq using HiSeq 2000 100PE. When the data were back, I mapping them to the reference sequence, but got very low mapping rate (30-40%). I... […]
      miaom
    • cross-species data - questions about normalization May 23, 2013
      Hi, I have some data form various samples (cell types) in different species. I want to compare and analyze gene expression variability across the... […]
      trelek2
    • CuffDiff strange output May 23, 2013
      Hi, I hope that someone can be so gentle to help me. I'm analizing some data from RNA-Seq with TopHat and Cufflinks and I focus my attention on... […]
      Pruexel
  • RSS Biostar – RNA-Seq

    • Why am I getting so many unmapped reads in STAR, classified as "too short"?
      I am currently using STAR to map several Hi-SEQ mRNA runs. I'm having trouble getting a decent amount of reads to map, but I don't really understand why. I'm hoping you can shed some light :) In the final log, only about 50% (or less) of the reads map to the reference. I'm using a GTF in addition to the genome. The unmapped bin that most […]
    • What are the best practices for SNP identification in RNA seq transcriptome data
      I have 20 RICE RNA seq tranascriptome data hiseq 2000 platform paired end reads. I aligned fasta reads with BWA and remove PCR duplicates with PICARD. Later I call SNP with samtools using various parameters. I would like to clarify what parameters should I used while alinging to reference rice genome for looking SNP location 100 bp upstream and 250 bp downst […]
    • How do TopHat options -g , --supress-hits, and Bowtie options interplay?
      Hi, I am currently using TopHat2 to map RNA-seq runs. I think there have been some changes pertaining the -g option. Does anyone know how it works now? I used to think that setting -g would look for n alignments for a given read, report them [if top-scoring] and discard those reads that had more than g [top scoring] alignments. Now, the description sounds mo […]
    • What happened to -k in TopHat for multiple-mapping reads?
      Selecting -g n in tophat does not discard reads mapping more than n, but instead only reports n alignments for those out all all their TOP scoring alignments. I think there used to be an option -k that would allow one to discard reads that topped x alignments -- whatever happened to that? I only see -g in the tophat 2 manual, no reporting options like before […]
    • Does tophat use the library-type information for mapping, or just for the XS flag?
      When I specify library-type to TopHat, i.e., first-strand, second-strand, unstranded, TopHat appends a value + or - to the XS:A flag, which is useful for subsequent analyses, such as annotation. However, does this information actually influence the "mappability" of reads, or is this unaffected? My thinking is that the information would be considere […]
    • Purpose of Y-shaped adapters in Illumina Sequencing?
      Hi all, Y adapters different sequences to be annealed to the 5' and 3' ends of each molecule in a library. The arms of the Y are unique, and the middle part, connected to the DNA fragment, is complementary. What are the advantages of this? My take of this over having fully-complementary adapters (ADAPTER1 - - - - - ADAPTER1) is that: -Upon primer a […]