Coexpression networks are data-derived representations of genes behaving in a similar way across tissues and experimental conditions. They have been used for hypothesis generation and guilt-by-association approaches for inferring functions of previously unknown genes. So far, the main platform for expression data has been DNA microarrays, however the recent development of RNA-seq allows for higher accuracy and coverage of transcript populations. It is therefore important to assess the potential for biological investigation of coexpression networks derived from this novel technique in a condition-independent dataset.

A team led by researchers at Institute of Applied Genomics Italy collected 65 publicly available Illumina RNA-seq high quality Arabidopsis thaliana samples and generated Pearson correlation coexpression networks. These networks were then compared with those derived from analogous microarray data. They show how Variance-Stabilizing-Transformed (VST) RNA-seq data samples are the most similar to microarray ones, with respect to inter-sample variation, correlation coefficient distribution and network topological architecture. Microarray networks show a slightly higher score in biology-derived quality assessments such as overlap with the known protein-protein interaction network and edge ontological agreement.

Different coexpression network centralities are investigated; in particular, they show how betweenness centrality is generally a positive marker for essential genes in Arabidopsis thaliana, regardless of the platform originating the data. In the end, the team focused on a specific gene network case, showing that, although microarray data seem more suited for gene network reverse engineering, RNA-seq offers the great advantage of extending coexpression analyses to the entire transcriptome.

  • Giorgi FM, Del Fabbro C, Licausi F. (2013) Comparative study of RNA-seq- and Microarray-derived coexpression networks in Arabidopsis thaliana. Bioinformatics [Epub ahead of print]. [abstract]

Incoming search terms:

  • microarray rna-seq platforms database
  • rna seq database arabidopsis
  • licausi f and coexpression
  • arabidopsis illumina
  • kendall versus pearson for rnaseq
  • coexpression network rnaseq
  • ARABIDOPSIS RNA SEQUENCE DATA
  • arabidopsis rna-seq data
  • coexpression analysis rna-seq
  • rna sequencing coexpression network

One of the computational challenges in plant systems biology is to accurately infer the transcriptional regulation relationships based on the correlation analyses of gene expression patterns. Despite the several correlation methods that have been applied in biology to analyze microarray data, concerns regarding the compatibility of these methods to the gene expression data profiled by high-throughput RNA transcriptome sequencing (RNA-Seq) technology have been raised. These concerns are mainly due to the fact that the distribution property of read counts in RNA-Seq experiments is different from that of fluorescence intensities in microarray experiments. Therefore, a comprehensive evaluation of the existing correlation methods and if necessary, introduction of novel methods into biology have been expected.

In this study, researchers at the University of Arizona compared four existing correlation methods used in microarray analysis and one novel method called Gini correlation coefficient, on previously published microarray-based and sequencing-based gene expression data in Arabidopsis and maize. The comparisons were performed on more than 11,000 regulatory relationships in Arabidopsis, including 8,929 pairs of transcription factors and target genes. The analyses pinpointed the strengths and weaknesses of each method, and indicated that the Gini correlation can compensate for the shortcomings of the Pearson correlation, the Spearman correlation, the Kendall correlation and the Tukey’s biweight correlation.

The Gini correlation method, along with the other four evaluated methods in this study, was implemented as an R package named “rsgcc” that can be utilized as an alternative option for biologists to perform clustering analyses of gene expression patterns or transcriptional network analyses.

The rsgcc package is available at: http://cran.r-project.org/web/packages/rsgcc/index.html

  • Ma C, Wang X. (2012) Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis. Plant Physiol [Epub ahead of print]. [abstract]

Incoming search terms:

  • rna-seq arabidopsis database
  • illumina rnaseq public data
  • rna-seq visualization ref-seq
  • strengths and weaknesses of correlation analysi

alternative splicingResearchers at the Medical University of Vienna have conducted a genome-wide study analyzing alternative splicing (AS) in Arabidopsis. AS is a key regulatory mechanism that contributes to transcriptome and proteome diversity.

They performed high-throughput sequencing of a normalized cDNA library which resulted in a high coverage transcriptome map of Arabidopsis.

Major findings:

  • Detected ~150,000 splice junctions derived mostly from typical plant introns, including an 8-fold increase in the number of U12 introns (2,069).
  • ~61% of multi-exonic genes are alternatively spliced under normal growth conditions.
  • The most frequent AS events (~40%) are intron retentions (IR).
  • Many IRs have relatively low read coverage and are less well represented in assembled transcripts.
  • ~51% of Arabidopsis genes produce AS transcripts which do not involve IR.
  • A large set of cryptic introns identified inside annotated coding exons.
  • Extensive AS coupled to nonsense-mediated decay in AFC2, encoding a highly conserved LAMMER kinase which phosphorylates splicing factors, thus establishing a complex loop in AS regulation.

Marquez Y, Brown JW, Simpson CG, Barta A, Kalyna M. (2012) Transcriptome survey reveals increased complexity of the alternative splicing landscape in Arabidopsis. Genome Res [Epub ahead of print]. [article]

Incoming search terms:

  • rna seq arabidopsis
  • arabidopsis rna-seq database
  • arabidopsis rna-seq
  • rna-seq arabidopsis
  • arabidopsis rna seq
  • arabidopsis rna seq database
  • rnaseq arabidopsis
  • ngs rna seq ppt
  • arabidopsis rnaseq
  • Arabidopsis transcriptome

plant cellIf you were wondering about which genes are involved in the copper dependence of iron homeostasis in Arabidopsis, here’s your answer!

Turns out its:  SQUAMOSA PROMOTER BINDING PROTEIN-LIKE7 (SPL7), FERRIC REDUCTASE OXIDASE5 (FRO5) and FERRIC REDUCTASE OXIDASE4 (FRO4).

Yep, researchers at Ruhr University, Germany discovered this with transcriptome sequencing.

They sequenced  both wild-type and a mutant of Arabidopsis thaliana plants.  The mutant was defective in the gene encoding SPL7. In response to Cu deficiency, FERRIC REDUCTASE OXIDASE5 (FRO5) and FRO4 transcript levels increased strongly, in an SPL7-dependent manner. Biochemical assays and confocal imaging of a Cu-specific fluorophore showed that high-affinity root Cu uptake requires prior FRO5/FRO4-dependent Cu(II)-specific reduction to Cu(I) and SPL7 function. Plant iron (Fe) deficiency markers were activated in Cu-deficient media, in which reduced growth of the spl7 mutant was partially rescued by Fe supplementation. Cultivation in Cu-deficient media caused a defect in root-to-shoot Fe translocation, which was exacerbated in spl7 and associated with a lack of ferroxidase activity. This is consistent with a possible role for a multicopper oxidase in Arabidopsis Fe homeostasis, as previously described in yeast, humans, and green algae.

These insights into root Cu uptake and the interaction between Cu and Fe homeostasis will advance plant nutrition, crop breeding, and biogeochemical research.

  • Bernal M, Casero D, Singh V, Wilson GT, Grande A, Yang H, Dodani SC, Pellegrini M, Huijser P, Connolly EL, Merchant SS, Krämer U. (2012) Transcriptome Sequencing Identifies SPL7-Regulated Copper Acquisition Genes FRO4/FRO5 and the Copper Dependence of Iron Homeostasis in Arabidopsis. Plant Cell [Epub ahead of print]. [abstract]

  • Social Networking Pages

    Linkedin Group

  • Follow Me on Pinterest
  • RSS SEQanswers – RNA Sequencing

    • RNAseq (SOLiD) from 18 - 200 nt June 18, 2013
      We are interested in small non-coding RNAs. Whomever you ask about the size range of small RNAs, you get a different answer. ;) Lets assume, small... […]
      GenomicIBK
    • Unmapped ratio very high on mouse genome June 17, 2013
      Hi, My problem regards RNA-Seq data. I've downloaded public data (SAGE libs w/ 6 different samples from mouse liver ) to analyse using ArrayStudio.... […]
      le.nono
    • RNASeq: Read length different from expected June 17, 2013
      Hello all, I have received paired-end reads for 40 samples. The reads are supposed to be 100bp per end. Instead, 20 of my samples are 101bp per... […]
      gogodidi
    • How to install xgawk June 16, 2013
      Hi, This is Shrujan, i have a problem while running RNA Sequencing QC. It shows an error that xgawk is not found. So please help me installing... […]
      shrujan
    • RNA Sequencing QC Error while using with Sequence_QC.sh file June 15, 2013
      Hi, This is Shrujan kumar Madadha, I had an error while running QC for Drosophila Yukuba fastq RNA file using Sequence_QC.sh file of FASTX... […]
      shrujan
    • Cuffmerge related query June 12, 2013
      I have a query regarding what samples should be merged using cuffmerge, when you have multiple phenotypes (each with replicates). Lets say my mouse... […]
      ParthavJailwala
  • RSS Biostar – RNA-Seq

    • edgeR: very low p-value and very high variance within the group of replicates. What's my problem??
      I'm using edgeR in order to perform differential expression analysis from RNA-seq experiment. I have 6 samples of tumor cell, same tumor and same treatment: 3 patient with good prognosis and 3 patient with bad prognosis. I want to compare the gene expression among the two groups. I ran the edgeR pakage like follow: x […]
    • Normalising tag count to RPKM
      Hi! I was wondering if their is a way to normalise the number of reads in a region and the RPKM of the nearest gene to that region, so that a correlation could be computed. Like the following data shows number of tags in first column and RPKM in second column Tags RPKM 15 0.14619 11 0 203 0.2259 129 10.701 300 7.0772 122 2.3234 346 10.666 77 3.117 201 16.749 […]
    • a simple question on RNA-Seq terminology
      This question may be very simple and basic, but I just need to confirm that I understand the differences among those terminologies in the RNA-Seq context. Suppose I have a sample called SLR, and it is sequenced on 5 lanes, so I have (among other output files) BAM files like L1_SLR, L2_SLR, L3_SLR, L5_SLR and L7_SLR.bam. Here, the letter "L" denotes […]
    • FInding regions of interest with minimum coverage
      Hi, I have a bam file of all my accepted hits (tophat output) and an gtf file with my genes of interest for which I am trying to find potential antisense transcripts. I would like to create a list - preferably one that can be visualized in a genome browser - that shows all genes that have antisense reads in the accepted hits.bam file provided that there are […]
    • How to remove the intronic reads before counting
      I got RNASeq data in several samples. I checked the FastQC, seems the read quality are good (Hiseq 2000). But the problem is many reads are mapped to intronic region, and the regions have no any reference exons there (Refseq, ensembl, gencode). We don't know what they are. We guess the problem happend in library preparation, the concentration was low. N […]
    • Which strand of the mRNA molecule does the sequencer output as a "read"?
      In Illumina Stranded RNA-Seq (using the dUTP method), do the final reads in the fastq files correspond to the initial molecule (that was transcribed), or to the reverse complement of the molecule? C […]