SeqGene is an open-source software for mining next-gen sequencing datasets, focusing on post-alignment quality control, SNP and indel identification and annotation, RNA expression quantification, allele specific expression, and expression-genotying association analysis. SeqGene is especially suited for RNA-seq and exonome-seq applications, with focus on protein coding and regulatory regions of a genome. For RNA-seq applications, SeqGene implemented a novel topology-based pathway analysis method to identify SNP-Expression co-enrichment and SNP-Expression paths. Read more

Incoming search terms:

  • seqgene
  • RNA-seq_pipeline pdf
  • RNA seq data mining
  • rnaseq data to pathways
  • RNA seq for allele mining
  • cut sequence mining software
  • mining of RNAseq dataset in arabidopsis
  • how to mine rnaseq data
  • Herpès virus des bivalves
  • general purpose software

Incoming search terms:

  • de novo gene prediction via rna sequencing
  • gene prediction rnaseq
  • predicting orfs from rna seq
  • rnaseq find lncra
  • rna-seq to find orf
  • RNA-seq new gene
  • RNA-SEQ find new gene
  • RNA seq result ORF
  • rna seq predict orf
  • ORF prediction for RNAseq data

Variability Across RNA-seq Experiments Suggests Need for Careful Study Design

Researchers found that even at high coverage, the estimate of the relative abundance of a particular transcript can “substantially disagree” between sequencing experiments using the same platforms and protocols.

Study Finds Array and Sequencing Combo Yields Novel Info on Gene Expression

A recent study suggests that using arrays and sequencing together could help generate more reliable data sets than when either method is used alone, and should help improve the confidence of functional analysis.

Yesso Scallop

Bivalves comprise 30,000 extant species, constituting the second largest group of mollusks. However, limited genetic research has focused on this group of animals so far, which is, in part, due to the lack of genomic resources. The advent of high-throughput sequencing technologies enables generation of genomic resources in a short time and at a minimal cost, and therefore provides a turning point for bivalve research. In the present study, we performed de novo transcriptome sequencing to first produce a comprehensive expressed sequence tag (EST) dataset for the Yesso scallop (Patinopecten yessoensis). Read more

Incoming search terms:

  • 454 rna denovo
  • transcriptome 2013 -newt -wasp
  • news patinopecten yessoensis

FlyBaseFlyBase has just incorporated several new RNA-Seq data sets from the modENCODE project. These data sets differ from our current RNA-Seq data in that the expression is displayed by strand. One of these data sets includes temporal expression data from the embryonic stages. The other data sets include expression data from a selection of tissues and timepoints, and under a variety of treatments. RNA-Seq expression data, by strand, from cell lines (e.g. Kc, S2) is also now available.

The Treatment Data represents the transcriptome of 4-day old mated adult flies and/or feeding third instar larvae that were fed or exposed to various toxins or environmental stress factors encountered in nature. The concentrations and exposure times used in this study were taken from previously published experiments or were based on experimentally determined LD50 results when there were no preexisting data available. These data can be viewed on GBrowse by selecting the Data Source menu option “D. melanogaster RNA-Seq Data” and selecting the appropriate tracks.

(read more… )

Incoming search terms:

  • RNA-Seq gbrowse log

miRDeep

The capacity of highly parallel sequencing technologies to detect small RNAs at unprecedented depth suggests their value in systematically identifying microRNAs (miRNAs). However, the identification of miRNAs from the large pool of sequenced transcripts from a single deep sequencing run remains a major challenge.

Here, the authors present an algorithm, miRDeep, which uses a probabilistic model of miRNA biogenesis to score compatibility of the position and frequency of sequenced RNA with the secondary structure of the miRNA precursor.

The miRDeep package was developed to discover active known or novel miRNAs from deep sequencing data (Solexa/Illumina, 454, …). The package consists of everything you need to analyze your own deep sequencing data after removal of ligation adapters: a number of scripts to preprocess the mapped data, and the core miRDeep algorithm that will analyze and score these data.

They demonstrate its accuracy and robustness using published Caenorhabditis elegans data and data they generated by deep sequencing human and dog RNAs. miRDeep reports altogether approximately 230 previously unannotated miRNAs, of which four novel C. elegans miRNAs are validated by northern blot analysis.

miRDeep is freely available at: http://www.mdc-berlin.de/en/research/research_teams/systems_biology_of_gene_regulatory_elements/projects/miRDeep/index.html

Friedländer MR, Chen W, Adamidi C, Maaskola J, Einspanier R, Knespel S, Rajewsky N. (2008) Discovering microRNAs from deep sequencing data using miRDeep. Nat Biotechnol 26(4), 407-15. [abstract]

Incoming search terms:

  • rna deep sequencing
  • deep sequencing
  • mirdeep
  • deep sequencing rna
  • deep rna sequencing
  • mirdeep characters
  • mirdeep pair end
  • mirdeep database
  • mirdeep results
  • mirdeep*

bioinformatics

While RNA-Seq’s capability of high-resolution and accuracy in transcript abundance estimation has been thoroughly demonstrated, (so much so that it is being heralded as a possible replacement for microarray based gene expression technology) there is another important application for RNA-Seq; the improvement of existing genome annotations and even the possibility of complete de novo genome annotation.

Improvements to current genome annotation is a topic that has been discussed before on the RNA-Seq Blog. See post from earlier this year:

Jan 13 – RNA-Seq Datasets Improving Genome Annotation in Plants, Animals, Bacteria

Jan 7 – Improvements to Ensembl include a de novo RNA-seq gene annotation pipeline

Now, researchers at UC Berkley and the Broad Institute have developed a novel approach termed “reference annotation based transcript (RABT) assembly”.  They claim that it is a “pure” assembler and that it does not utilize information about the structure and content of coding genes, or other external input (e.g. ESTs) during the assembly.

However, a problem exists with using RNA-Seq for annotation. Genes that are expressed at a low level will be represented by few reads and may be only partially covered. This means that naive assembly methods will fail to reconstruct the majority of full-length transcripts.

(Read how their method overcomes this problem… )

Availability: The methods described in this paper are implemented in the Cufflinks suite of software for RNA-Seq, freely available from http://bio.math.berkeley.edu/cufflinks.

  • Roberts A, Pimentel H, Trapnell C, Pachter L. (2011) Identification of novel transcripts in annotated genomes using RNA-Seq. Bioinformatics [Epub ahead of print]. [abstract]

Incoming search terms:

  • rna-seq genome annotation
  • rna-seq annotation
  • rna seq annotation
  • annotate rnaSeq
  • annotation rna seq
  • RNA-seq gene annotation
  • how to annotate rnaseq data
  • identification of novel transcripts in annotated genomes using rna-seq
  • rna annotation pipeline
  • RNA-seq annotation pipeline

The FDA has begun to develop their program to evaluate sequencing based diagnostics. At a recent meeting, the Association for Molecular Pathology (AMP) advised FDA officials on many important considerations for evaluating the analytical validity of next-generation sequencing:

The analytical validation requirements for NGS will vary based on the clinical application at issue, such as a mutation panel for a Mendelian disease versus transcriptome analysis.

Performance of, and coverage needs for, a given platform are likely to differ depending on:

  • the nucleic acid analyzed
  • the characteristics of the DNA regions and the type of variations interrogated
  • the relative allele proportions of particular variants
  • whether quantitative or qualitative results are desired

Flexibility and individualization is necessary in the development of validation protocols, guidelines, and controls on an application-by-application basis.

The test system, the analytical validity of the instrument and the performance of the bioinformatics software should be evaluated both independently and as a complete system.

(read the entire release… )

Incoming search terms:

  • NGS for clinics
  • rna seq clinic

Date:                  Aug 25-26, 2011
Location:         Amsterdam Medical Centre, Amsterdam
Organizer:       NBIC & LUMC
Contact(s):      Dr. Celia van Gelder
Level:               PhD

NBIC and LUMC will organize a 2-day course on RNA-seq data analysis on August 25 and 26, 2011. The course will be hosted by Antoine van Kampen at the AMC, Amsterdam. The course will consist of seminars and hands-on R practicals and will focus on data preprocessing, quality control, and statistical methods for detection of differential gene expression. It will be an expert course and a follow-up course on the general NBIC NGS data analysis course (which will be given from 5-7 september 2011 in Leiden. Participants for the RNA-seq course should preferably have participated in the general NGS course or otherwise have ample experience with NGS technology. The course is aimed at PhD students and postdocs, but scientific programmers with some background in biology and bioinformatics may also attend.

Course topics:

  • RNA-seq experimental approaches
  • Alignment
  • Statistics for differential gene expression
  • eQTL analysis R packages for RNA-seq data analysis

Confirmed speakers:

Rutger Brouwer, Lude Franke, Jelle Goeman, Philip de Groot, Peter-Bram ‘t Hoen (course coordinator), Antoine van Kampen, Nagesha Rao, Marieke Simonis, Marcel Willemsen, Kai Ye, Erik van Zwet

(more info… )

If you’re new to RNA-Seq or computational Biology in general, here is a short presentation overview.

from Wei Sun – Assistant professor, University of North Carolina-Chapel Hill Department of Biostatistics – Bios 784: Introduction to Computational Biology – class notes

http://www.bios.unc.edu/~wsun/teach/RNA-seq_pipeline.pdf

pipeline

 

Incoming search terms:

  • rna seq analysis pipeline
  • rna-seq analysis pipeline
  • rna seq pipeline
  • rnaseq analysis pipeline
  • NGS pipeline
  • ngs data analysis pipeline
  • data analysis pipeline
  • RNA-seq pipline
  • sequencing data analysis pipeline
  • pipeline for RNA Seq data analysis

Is RNA-seq data really “digital”? Is it more sensitive or reliable than microarrays?
Before you go, assess technological bias, limitation and cost-performance with publicly available data. not on the vendor’s champion data.

Data Source: GSE29155
RNA-Seq anlalysis of prostate cancer cell lines using Next Generation Sequencing
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE29155

Incoming search terms:

  • rna quality for rna-seq
  • facs rna seq
  • facs rnaseq
  • FACS sorting combined RNA-Seq
  • how many facs-sorted cells for rna-seq
  • quality of rna seq data
  • rna seq facs

Several viruses are known to cause cancer, such as human herpes virus 8 in Kaposi sarcoma and human papilloma viruses in cervical cancer. Recently, Merkel cell polyoma virus (MCPyV) has been described in 80% of Merkel cell carcinomas (MCC). Similarly to MCC and Kaposi sarcoma, melanoma incidence is increased in immunosuppressed patients.

Melanoma is an aggressive type of cancer; known risk factors to develop melanoma are UV exposition, age and skin type. Intriguingly, melanoma also occurs more frequently in immunosuppressed patients. Although electron microscopy revealed virus-like particles in melanoma, up to now, no virus causing melanoma could be identified. Researchers at University of Tübingen, Germany set out to determine whether infection by known or yet unknown viruses may play a role in melanoma development as well.

To detect viral sequences expressed in melanoma cells, they analysed three melanoma metastases by whole-transcriptome sequencing and digital transcriptome subtraction. None of the samples investigated harboured viral sequences. In contrast, artificial viral sequences and MCPyV transcripts used as a positive control for the bioinformatics analysis were detected. This renders it less likely that viruses are frequently involved in melanoma induction. A larger number of melanoma transcriptome sequencings are required to rule out viruses as a relevant pathogen.

Feldhahn M, Menzel M, Weide B, Bauer P, Meckbach D, Garbe C, Kohlbacher O, Bauer J. (2011) No evidence of viral genomes in whole-transcriptome sequencing of three melanoma metastases. Exp Dermatol [Epub ahead of print]. [abstract]

Incoming search terms:

  • melanoma virus
  • Viruses and melanomas
  • virus melanoma
  • virus causing melanoma
  • rnaseq viral human herpes
  • melanoma and viruses
  • is melanoma a virus
  • rnaseq data melanoma
  • rnaseq virus
  • viral melanoma

Briefings in BioinformaticsRandomization

For RNA-seq experiments, besides the randomization in preparing the research subjects, there are many other steps to consider for randomization due to the complexity of the technologies. For example, we can randomize the sample order for various steps in the library construction and the order/location of the samples in the sequencer.

Replication

The most desirable replicates are the biological replicates, which are true replicates and provide us the variation among biological samples. Some studies include biological replicates, while many others only have technical replicates that are repeated measurements from the same biological sample. If the goal is to evaluate the technology, technical replicates alone are sufficient.

RNA-Seq Specific Effects

RNA-seq experiments can be affected by common variability coming from various technical effects like processing date, technician and reagent batch. However, there are some recognized technical effects specific to the RNA-seq procedures. Among these sources of variation, the library preparation effect is the largest. The flow cell and lane effects are relatively small.

Sequencing Depth

Due to the random sampling nature of RNA-seq, it will take a large number of sequences to measure the transcripts that are expressed at low level. For a given budget, it is critical to decide whether to increase the sequencing depth to have more accurate measurements on the genes expressed at low level or increase the sample size with limited sequencing depth for each sample. It would take extremely deep coverage in order to detect allelic differential expression for genes expressed at a fairly low level.

Paired-end Sequencing

At the same sequencing depth, the pair-end sequences increase the sensitivity and specificity of the detection of the alternative splicing and chimeras in comparison with the single end sequencing.

Biases of Next-Generation Sequencing

In reality, sequence reads are not exactly randomly obtained from transcripts. Biases have been found to be related to GC content of the sequence, the use of the random hexamer primers, 3′ and 5′ depletion or bias towards 3′-end, and bias toward specific RNA species. Most of these biases are related to library preparation methods. From the experimental design point of view, these biases increase the required samples size and sequence depth, which emphasize the importance of choosing better protocols and selecting the right analysis methods.

Sample Size Calculation for RNA-Seq

The sample size may be determined at two levels—the number of lanes for technical replicates in one treatment or the number of biological replicates for each treatment. In the cases when there are only technical replicates and the library preparation effects and lane effects are negligible or mitigated by proper designs, sample sizes can be calculated gene-by-gene based on Poisson models. When there are biological replicates and the over-dispersion problem exists, NB distributions are more appropriate than Poisson distributions to model the RNA-seq data. First obtain the sample sizes for one gene and then determine the overall sample size based on the overall average power.

Validation

It is worth pointing out that validation using qRT-PCR on the same RNA samples assayed in the RNA-seq analysis only validates the technology. It does not validate the conclusion about the treatments/conditions. It is the validation using different biological replicates from the same populations that can further validate the biological conclusions from RNA-seq experiments.

Fang Z, Cui X. (2011) Design and validation issues in RNA-seq experiments. Brief Bioinform. 12(3), 280-87. [abstract]

Incoming search terms:

  • rna seq experimental design
  • rna-seq experimental design
  • rna-seq experiment
  • rna-seq power analysis
  • rna seq replicates
  • rna-seq validation
  • experimental design rna-seq
  • power calculation rna-seq
  • rnaseq power calculations
  • rna-seq replicates

Next Page →

  • Social Networking Pages

    Linkedin Group

  • Follow Me on Pinterest
  • RSS SEQanswers – RNA Sequencing

    • DESeq; can I omit timepoints during dispersal estimation? May 24, 2013
      I have a bacterial timecourse with 2 biological replicates per timepoint. There is a fair bit of variance between my replicates. I have spent the... […]
      amcloon
    • HT Seq Count stranded options May 24, 2013
      I am very new to bioinformatics, so I would be really grateful for some help! I have been using *HTSeq Count v0.5.3* and I am bit confused about... […]
      qwrissie
    • Tophat 2.0.8b installation error May 24, 2013
      I install tophat-2.0.8b to rerun the mapping. but when i make it, the error appears like this. make[1]: Entering directory... […]
      canhu
    • reason for low mapping rate?? May 23, 2013
      we did RNASeq using HiSeq 2000 100PE. When the data were back, I mapping them to the reference sequence, but got very low mapping rate (30-40%). I... […]
      miaom
    • cross-species data - questions about normalization May 23, 2013
      Hi, I have some data form various samples (cell types) in different species. I want to compare and analyze gene expression variability across the... […]
      trelek2
    • CuffDiff strange output May 23, 2013
      Hi, I hope that someone can be so gentle to help me. I'm analizing some data from RNA-Seq with TopHat and Cufflinks and I focus my attention on... […]
      Pruexel
  • RSS Biostar – RNA-Seq

    • Why am I getting so many unmapped reads in STAR, classified as "too short"?
      I am currently using STAR to map several Hi-SEQ mRNA runs. I'm having trouble getting a decent amount of reads to map, but I don't really understand why. I'm hoping you can shed some light :) In the final log, only about 50% (or less) of the reads map to the reference. I'm using a GTF in addition to the genome. The unmapped bin that most […]
    • What are the best practices for SNP identification in RNA seq transcriptome data
      I have 20 RICE RNA seq tranascriptome data hiseq 2000 platform paired end reads. I aligned fasta reads with BWA and remove PCR duplicates with PICARD. Later I call SNP with samtools using various parameters. I would like to clarify what parameters should I used while alinging to reference rice genome for looking SNP location 100 bp upstream and 250 bp downst […]
    • How do TopHat options -g , --supress-hits, and Bowtie options interplay?
      Hi, I am currently using TopHat2 to map RNA-seq runs. I think there have been some changes pertaining the -g option. Does anyone know how it works now? I used to think that setting -g would look for n alignments for a given read, report them [if top-scoring] and discard those reads that had more than g [top scoring] alignments. Now, the description sounds mo […]
    • What happened to -k in TopHat for multiple-mapping reads?
      Selecting -g n in tophat does not discard reads mapping more than n, but instead only reports n alignments for those out all all their TOP scoring alignments. I think there used to be an option -k that would allow one to discard reads that topped x alignments -- whatever happened to that? I only see -g in the tophat 2 manual, no reporting options like before […]
    • Does tophat use the library-type information for mapping, or just for the XS flag?
      When I specify library-type to TopHat, i.e., first-strand, second-strand, unstranded, TopHat appends a value + or - to the XS:A flag, which is useful for subsequent analyses, such as annotation. However, does this information actually influence the "mappability" of reads, or is this unaffected? My thinking is that the information would be considere […]
    • Purpose of Y-shaped adapters in Illumina Sequencing?
      Hi all, Y adapters different sequences to be annealed to the 5' and 3' ends of each molecule in a library. The arms of the Y are unique, and the middle part, connected to the DNA fragment, is complementary. What are the advantages of this? My take of this over having fully-complementary adapters (ADAPTER1 - - - - - ADAPTER1) is that: -Upon primer a […]