With the availability of gene expression data by RNA-seq, powerful statistical approaches for grouping similar gene expression profiles across different environments have become increasingly important. A team led by researchers at Penn State University describe and assess a computational model for clustering genes into distinct groups based on the pattern of gene expression in response to changing environment. The model capitalizes on the Poisson distribution to capture the count property of RNA-seq data. A two-stage hierarchical expectation-maximization (EM) algorithm is implemented to estimate an optimal number of groups and mean expression amounts of each group across two environments. A procedure is formulated to test whether and how a given group shows a plastic response to environmental changes. The impact of gene-environment interactions on the phenotypic plasticity of the organism can also be visualized and characterized. The model was used to analyse an RNA-seq dataset measured from two cell lines of breast cancer that respond differently to an anti-cancer drug, from which genes associated with the resistance and sensitivity of the cell lines are identified. They performed simulation studies to validate the statistical behaviour of the model. The model provides a useful tool for clustering gene expression data by RNA-seq, facilitating understanding of gene functions and networks.

rna-seq

  • Wang N, Wang Y, Hao H, Wang L, Wang Z, Wang J, Wu R. (2013) A bi-Poisson model for clustering gene expression profiles by RNA-seq. Brief Bioinform [Epub ahead of print]. [abstract]

Incoming search terms:

  • gene expression heart vertebrate
  • star alignment r rnaseq

RNA-seq has shown huge potential for phylogenomic inferences in non-model organisms. However, error, incompleteness, and redundant assembled transcripts for each gene in de novo assembly of short reads cause noise in analyses and a large amount of missing data in the aligned matrix. To address these problems, we compare de novo assemblies of paired end 90 bp RNA-seq reads using Oases, Trinity, Trans-ABySS and SOAPdenovo-Trans to transcripts from genome annotation of the model plant Ricinus communis. By doing so we evaluate strategies for optimizing total gene coverage and minimizing assembly chimeras and redundancy.

Researchers at the University of Michigan found that the frequency and structure of chimeras vary dramatically among different software packages. The differences were largely due to the number of trans-self chimeras that contain repeats in the opposite direction. More than half of the total chimeras in Oases and Trinity were trans-self chimeras. Within each package, they found a trade-off between maximizing reference coverage and minimizing redundancy and chimera rate.

In order to reduce redundancy, they investigated three methods: Read more

Incoming search terms:

  • lokus no

Maarten Leerkes speaks to Izzy Scott Moncrieff in the run up to the  upcoming RNA-Seq 2013 Summit, 18th–20th June 2013, Boston, MA

Maarten Leerkes provides bioinformatics support for various projects at NIAID as an employee of Medical Science and Computing, Inc. His areas of research include the use of bioinformatics to interpret sequencing data and to find patterns that can be extrapolated into diagnostic tools for improving treatment options for patients.

What initially attracted you to RNA-Seq?

During my doctoral training, I worked with open reading frame ESTs and SAGE platforms that aimed towards similar end products as RNA-Seq. The difference being that RNA-Seq has more sequencing depth and its information content is much higher.

A prevailing theme during my Ph.D. was a need for innovative technologies to process extra volumes of data and to answer certain types of research questions that were limited with existing technologies. With the development of RNA-Seq, many challenges to answering some of those questions were overcome. My interest, for example, was to answer questions related to alternative splicing and translocations that lead to fusions between gene products. The increased resolution of RNA Seq really opened up possibilities for me as I pursued my research.

Download a PDF of the entire interview at – http://rna-seqsummit.com/library

Incoming search terms:

  • foto payudara dan memek hot
  • Ribo-zero library(LncRNA library)?

Many species have evolved into diverse strains with phenotypic and genotypic variations that facilitate adaptation to different ecological niches and, in the case of pathogens, to different hosts. Whereas comparison of genome sequences reveals differences and similarities among strains, the consequences of genomic variations can be tracked by studying the functional output from the genome. RNA sequencing has been revolutionizing transcriptome analyses of both pro- and eukaryotes. However, the bioinformatics-based analysis is still lagging behind, and transcriptome features are often manually annotated, which is laborious and time-consuming. This is even more compounded for the analyses of multiple strains.

Here, a team led by researchers at the University of Würzburg and the University of Tübingen, Germany compared the primary transcriptomes of four isolates of Campylobacter jejuni, the leading cause of bacterial gastroenteritis in humans, and provide genome-wide transcriptional start site (TSS) maps using a novel automated annotation method. Their comparative RNA–seq showed that most TSS are conserved in multiple strains, but they also observed SNP–dependent promoter usage. Furthermore, the researchers identified a novel minimal RNA–based CRISPR immune system as well as strain-specific small RNA repertoires. This automated, comparative TSS annotation will facilitate and improve transcriptome annotation for a wider range of organisms and provides insights into the contribution of transcriptome differences to phenotypic variation among closely related species.

RNA-Seq

  • Dugar G, Herbig A, Förstner KU, Heidrich N, Reinhardt R, et al. (2013) High-Resolution Transcriptome Maps Reveal Strain-Specific Regulatory Features of Multiple Campylobacter jejuni Isolates. PLoS Genet 9(5), e1003495. [article]

Incoming search terms:

  • memek india
  • High-Resolution Transcriptome Maps Reveal Strain-Specific Regulatory Features of Multiple Campylobacter jejuni Isolates

The estimation of isoform abundances from RNA-Seq data requires a time-intensive step of mapping reads to either an assembled, or previously annotated transcriptome, followed by an optimization procedure for deconvolution of multi-mapping reads. These procedures are essential for downstream analysis such as differential expression. In cases where it is desirable to adjust the underlying annotation, for example upon the discovery of novel isoforms or errors in existing annotations, current pipelines must be rerun from scratch. This makes it difficult to update abundance estimates after re-annotation, or to explore the effect of changes in the transcriptome on analyses.

Researchers at UC Berkeley have developed a novel efficient algorithm for updating abundance estimates from RNA-Seq experiments upon re-annotation that does not require re-analysis of the entire dataset. Their approach is based on a fast partitioning algorithm for identifying transcripts whose abundances may depend on the added or deleted isoforms, and on a fast follow-up approach to re-estimating abundances for all transcripts. They demonstrate the effectiveness of our methods by showing how to synchronize RNA-Seq abundance estimates with the daily RefSeq incremental updates. Thus, they provide a practical approach to maintaining relevant databases of RNA-Seq derived abundance estimates even as annotations are being constantly revised.

ReXpress

Availability – ReXpress is freely available, together with source code, at http://bio.math.berkeley.edu/ReXpress/

Contact: lpachter@math.berkeley.edu

  • Roberts A, Schaeffer L, Pachter L. (2013) Updating RNA-Seq analyses after re-annotation. Bioinformatics [Epub ahead of print]. [abstract]

Incoming search terms:

  • www rna-seqblog com rexpress-for-updating-abundance-estimates-from-rna-seq-experiments-upon-re-annotation

Exosomes, endosome-derived membrane microvesicles, contain specific RNA transcripts that are thought to be involved in cell-cell communication. These RNA transcripts have great potential as disease biomarkers. To characterize exosomal RNA profiles systemically, a team led by researchers at the Medical College of Wisconsin performed RNA sequencing analysis using three human plasma samples and evaluated the efficacies of small RNA library preparation protocols from three manufacturers. In all they evaluated 14 libraries (7 replicates).

RNA-Seq

From the 14 size-selected sequencing libraries, the researchers obtained a total of 101.8 million raw single-end reads, an average of about 7.27 million reads per library. Sequence analysis showed that there was a diverse collection of the exosomal RNA species among which microRNAs (miRNAs) were the most abundant, making up over 42.32% of all raw reads and 76.20% of all mappable reads. At the current read depth, 593 miRNAs were detectable. The five most common miRNAs (miR-99a-5p, miR-128, miR-124-3p, miR-22-3p, and miR-99b-5p) collectively accounted for 48.99% of all mappable miRNA sequences. MiRNA target gene enrichment analysis suggested that the highly abundant miRNAs may play an important role in biological functions such as protein phosphorylation, RNA splicing, chromosomal abnormality, and angiogenesis. From the unknown RNA sequences, they predicted 185 potential miRNA candidates. Furthermore, they detected significant fractions of other RNA species including ribosomal RNA (9.16% of all mappable counts), long non-coding RNA (3.36%), piwi-interacting RNA (1.31%), transfer RNA (1.24%), small nuclear RNA (0.18%), and small nucleolar RNA (0.01%); fragments of coding sequence (1.36%), 5’ untranslated region (0.21%), and 3’ untranslated region (0.54%) were also present. In addition to the RNA composition of the libraries, they found that the three tested commercial kits generated a sufficient number of DNA fragments for sequencing but each had significant bias toward capturing specific RNAs. Read more

Incoming search terms:

  • exosomes
  • Illumina RNA deep sequencing (RNA-seq) technology
  • RNA: The genomes rising stars
  • RNA: The genomes rising stars ȫ

DNA sequencing technology is becoming more accessible to a variety of researchers as costs continue to decline. As researchers begin to sequence novel transcriptomes, most of these datasets lack a reference genome and will have to rely on de novo assemblers. Making comparisons across assemblies can be difficult: each program has its strengths and weaknesses and no tool exists to comparatively evaluate these datasets.

Now, a team led by researchers at the University of Rhode Island have developed software in R, called Sequence Comparative Analysis using Networks (SCAN) to perform statistical comparisons between distinct assemblies. SCAN uses a reference dataset to identify the most accurate de novo assembly and the ‘good’ transcripts in the user’s data. They tested SCAN on 3 publicly available transcriptomes, each assembled using 3 assembly programs. Moreover, they sequenced the transcriptome of the oomycete Achlya hypogyna and compared de novo assemblies from Velvet, ABySS, and the CLC Genomics Workbench assembly algorithms. One thousand one hundred and twenty eight (1,128) of the CLC transcripts were statistically similar to the reference, compared to 49 of the Velvet transcripts and 937 of the ABySS transcripts. SCAN’s strength is providing statistical support for transcript assemblies in a biological context. However, SCAN is designed to compare distinct node sets in networks, therefore it can also easily be extended to perform statistical comparisons on any network graph regardless of what the nodes represent.

SCAN

Availability – Two versions of SCAN were developed: “SCAN” and “SCAN stringent,” that can run either in single or multiprocessor nodes, and are available from http://evol-net.fr .

  • Misner I, Bicep C, Lopez P, Halary S, Bapteste E, Lane CE. (2013) Sequence Comparative Analysis using Networks (SCAN): software for evaluating de novo transcript assembly from next generation sequencing. Mol Biol Evol [Epub ahead of print]. [abstract]

Incoming search terms:

  • sequence comparative analysis using networks (scan) – software for evaluating de novo transcript assembly from rna-seq data
  • software for evaluating scanned im
  • rna-seq r package plot
  • www rna-seqblog com sequence-comparative-analysis-using-networks-scan-software-for-evaluating-de-novo-transcript-assembly-from-rna-seq-data

The granulosa cells in the mammalian ovarian follicle respond to gonadotropin signalling and are involved in the processes of folliculogenesis and oocyte maturation. Studies on gene expression and regulation in human granulosa cells are of interest due to their potential for estimating the oocyte viability and IVF success. However, the post-transcriptional gene expression studies on microRNA (miRNA) level in the human ovary have been scarce.

The current study determined the miRNA profile by deep sequencing of the two intrafollicular somatic cell types: mural and cumulus granulosa cells (MGC and CGC, respectively) isolated from women undergoing controlled ovarian stimulation and in vitro fertilization. Altogether 936 annotated and nine novel miRNAs were identified. Ninety of the annotated miRNAs were differentially expressed between MGC and CGC. Bioinformatic prediction revealed that TGFβ, ErbB signalling and heparan sulphate biosynthesis were targeted by miRNAs in both granulosa cell populations, while extracellular matrix remodelling, Wnt and neurotrophin signalling pathways were enriched among miRNA targets in MGC. Two of the novel miRNAs found were of intronic origin: one from the aromatase and the other from the FSH receptor gene. The latter miRNA was predicted to target the activin signalling pathway.RNA-Seq

In addition to revealing the genome-wide miRNA signature in human granulosa cells, these results suggest that post-transcriptional regulation of gene expression by miRNAs could play an important role in the modification of gonadotropin signalling. miRNA expression studies could therefore lead to new prognostic markers in assisted reproductive technologies.

  • Velthut-Meikas A, Simm J, Tuuri T, Tapanainen JS, Metsis M, Salumets A. (2013) Research Resource: Small RNA-seq of human granulosa cells reveals miRNAs in FSHR and aromatase genes. Mol Endocrinol [Epub ahead of print]. [abstract]

New sequencing technologies allow unprecedented views into changes occurring in virus-infected cells, including comprehensive and largely unbiased measurements of different types of RNA. In this study, researchers from the University of Washington used RNA-Seq to profile dynamic changes in cellular microRNAs occurring in HIV-infected cells. The sensitivity afforded by sequencing allowed them to detect changes in microRNA expression early in infection, before the onset of viral replication. A phased pattern of expression was evident among these microRNAs, and many that were initially suppressed were later overexpressed at the height of infection, providing unique signatures of infection. By integrating additional mRNA data with the microRNA data, they identified a role for microRNAs in transcriptional regulation during infection and specifically a network of microRNAs involved in the expression of a known HIV cofactor. Finally, as a distinct benefit of sequencing, they identified candidate nonannotated microRNAs, including one whose downregulation may allow HIV-1 replication to proceed fully.

RNA-Seq

  • Chang ST, Thomas MJ, Sova P, Green RR, Palermo RE, Katze MG. (2013) Next-generation sequencing of small RNAs from HIV-infected cells identifies phased microrna expression patterns and candidate novel microRNAs differentially expressed upon infection. MBio 4(1), e00549-12. [article]

Incoming search terms:

  • memek
  • seqanswers mirna

Alternative mRNA splicing (AS) is a major mechanism for increasing regulatory complexity. A key concept in AS is the distinction between alternatively and constitutively spliced exons (ASEs and CSEs, respectively). ASEs and CSEs have been reported to be differentially regulated, and to have distinct biological properties. However, the recent flood of RNA-sequencing data has obscured the boundary between ASEs and CSEs. Researchers are beginning to question whether ‘authentic CSEs’ do exist, and whether the ASE/CSE distinction is biologically invalid.

Here, Feng-Chi Chen with the National Health Research Institutes of Taiwan examines the influences of increasing transcriptome data on the human ASE/CSE classification and our past understanding of the properties of these two types of exons. Interestingly, although the percentage of human ASEs has increased dramatically in recent years, the overall distinction between ASEs and CSEs remains valid. For example, CSEs are longer, evolve more slowly, and less frequently correspond to intrinsically disordered protein regions than ASEs. In addition, only a relatively small number of human genes have their transcripts composed entirely of ASEs despite the large amount of high-throughput transcriptome information. Therefore, the ‘backbone’ concept of AS, in which CSEs constitute the invariant part and ASEs the flexible part of the transcript, appears to be generally true despite the increasing percentage of ASEs in the human exome.

alternative splicing

 

  • Chen FC. (2013) Are all of the human exons alternatively spliced? Brief Bioinform [Epub ahead of print]. [abstract]

Incoming search terms:

  • alternative splicing lung cancer mats
  • info gtechsplicing com
  • alternative splicing and ngs
  • alternative splicing programs
  • ngs and alternative splicing study
  • rna seq splicing variant
  • rna-seq splicing single-end 101
  • tophat quantitate splicing variant

Flybasefrom flybase.org

FlyBase is extending its initial gene-level analyses of RNA-seq throughput data from modENCODE and others. The algorithm for RPKM (reads per kilobase per million mapped reads) has been refined, additional datasets have been analyzed, and these data are now available for bulk download.

In order to summarize this type of data at the gene level, it is necessary first to determine a single value for the expression level of each gene for each RNA-seq sample. RNA-seq coverage data are intersected with FlyBase exons, based on the gene model annotations of the current release, to calculate a single value reflecting average coverage per kb per gene. Each gene data point is then classified into one of eight expression level bins, and the graphical and text summaries were produced from the binned values. A more detailed explanation may be found at FBrf0221009.

Bulk data files can be accessed from the Precomputed Data Files page (menu: Files → Current Release). Look in the Genes section; the item line is ‘RNA-Seq RPKM values’. You can download the file directly by clicking here.

Simple and combinatorial queries of RPKM expression data can conducted using the ‘RNA-Seq Search’ option found under the ‘Expression’ tab in the Quick Search tool.

(read more…)

Incoming search terms:

  • cryptic RNA-seq
  • drosophila tophat

NP-hardIsoform reconstruction is a key step in RNA-Seq analysis. Tools such as CEM, iReckon, NSMAP, and MonteBello use maximum likelihood for isoform reconstruction. The maximum likelihood approach has been observed to be computationally expensive. Here, researchers from Tsinghua University, China show that isoform reconstruction using short RNA-Seq reads by maximum likelihood is NP-hard.

  • Li T, Jiang R, Zhang X. (203) Isoform reconstruction using short RNA-Seq reads by maximum likelihood is NP-hard. arXiv:1305.0916 [q-bio.QM]. [article]

Incoming search terms:

  • www rna-seqblog com isoform-reconstruction-using-short-rna-seq-reads-by-maximum-likelihood-is-np-hard
  • cykao@csie ntu edu tw

from BioCompare by Josh P. Roberts

Knowledge of gene expression is crucial to understanding the molecular underpinnings of biology and medicine. As our ability to query the transcriptome grows—as new instrumentation comes online and becomes refined, along with the techniques necessary to use it—what was once the dominion of a few is becoming a workhorse in diverse labs.

biocompare“The era of RNA-Seq is definitely here,” says Christopher Mason Assistant Professor at Weill Cornel Medical College. “With RNA-Seq—especially for certain medium and high expressers—you get all of the same specificity of an array, plus greater sensitivity. You also get, not just expression by gene, by exon, by junction, but … also … SNP [single nucleotide polymorphism] information. Since it’s actual sequence data, you can look at the genetic variation present. You can look for things like gene fusion events or allele-specific expression. You can look for other rearrangements or new transcribed regions that are, by definition, novel, so they wouldn’t be on an array.” Mason further comments that the remainder of the RNA-Seq reads are sometimes from another species, what some might consider a “contaminant,” but which might also provide interesting and valuable information. In short, “You get a real wealth of information from the RNA-Seq data.” Read more

Next Page →

  • Social Networking Pages

    Linkedin Group

  • Follow Me on Pinterest
  • RSS SEQanswers – RNA Sequencing

    • Problem with cummeRbund diffData() May 20, 2013
      Hi all, I'm running Tophat/cufflinks/cuffdiff for differential gene expression and analysis with cummeRbund (v 2.0.0). I'm having an issue with... […]
      Enrique Zudaire
    • How to increase rowsize in heatmap? May 16, 2013
      Hi, I am a complete newbie to all things cummeRbund and am currently fighting with generating readable heatmaps. When I use ... […]
      Mags
    • novoalign mapping May 15, 2013
      Hi, I want to use novoalign to map reads - allowing up to 15 mismatches for 100 bp paired-end reads I am new to novoalign(went through the... […]
      abh
    • Design of expt across multiple lanes May 15, 2013
      Hi, I am performing an RNA-seq experiment to look at differential expression. The design is as follows: 2 populations x 3 biological... […]
      jbono
    • RNA kinds expected in RNA-seq results May 15, 2013
      Hi, We use RNA isolation and library preparation protocols which capture polyadenylated RNA. My question is what kinds of RNA can we expect to... […]
      Kocur
    • Discrepancy between genotype and expressed alleles May 15, 2013
      Hi all, I am working on the analysis of allele-specific expression using both genotype information and RNA-seq data from the same individuals. I... […]
      RedMary
  • RSS Biostar – RNA-Seq

    • How do TopHat options -g , --supress-hits, and Bowtie options interplay?
      Hi, I am currently using TopHat2 to map RNA-seq runs. I think there have been some changes pertaining the -g option. Does anyone know how it works now? I used to think that setting -g would look for n alignments for a given read, report them [if top-scoring] and discard those reads that had more than g [top scoring] alignments. Now, the description sounds mo […]
    • What happened to -k in TopHat for multiple-mapping reads?
      Selecting -g n in tophat does not discard reads mapping more than n, but instead only reports n alignments for those out all all their TOP scoring alignments. I think there used to be an option -k that would allow one to discard reads that topped x alignments -- whatever happened to that? I only see -g in the tophat 2 manual, no reporting options like before […]
    • Does tophat use the library-type information for mapping, or just for the XS flag?
      When I specify library-type to TopHat, i.e., first-strand, second-strand, unstranded, TopHat appends a value + or - to the XS:A tag, which is useful for subsequent analyses, such as annotation. However, does this information influence the "mappability" of reads, or is this unaffected? My guess is that the information will be considered for mapping […]
    • Purpose of Y-shaped adapters in Illumina Sequencing?
      Hi all, Y adapters different sequences to be annealed to the 5' and 3' ends of each molecule in a library. The arms of the Y are unique, and the middle part, connected to the DNA fragment, is complementary. What are the advantages of this? My take of this over having fully-complementary adapters (ADAPTER1 - - - - - ADAPTER1) is that: -Upon primer a […]
    • Cell Type composition in a tissue based on gene marker expression
      I am not sure if the following would even make sense.... Tissues are composed of composite cell types, and often there are studies such as microarray/NGS where we perform a collective sampling of cells from these tissues. Information about the composition (say percentage of cell type) is not taken into consideration. In some case (such as brain/cancer), ther […]
    • Which SNP caller / method to use after aligning RNA-seq with TopHat
      Which SNP caller / method can / should I use after aligning RNA-seq data with TopHat? For genomic data I use GATK, but supposedly it is not just as easy as running GATK on the TopHat RNA-seq data. The team from Broad has no information / documentation on how to use GATK for RNA-seq data. I don't have any variants yet from DNA re-sequencing. […]