In silico generated search for microRNAs (miRNAs) have been driven by methods compiling structural features of the miRNA precursor hairpin as well as to some degree combining this with analysis of RNA-seq profiles for which the miRNA typically leave the drosha/dicer fingerprint of 1-2 ~22nt blocks of reads corresponding to the mature and star miRNA.

In complement to the previous methods, researchers at the University of Copenhagen, Denmark present a study where they systematically exploit these pattern of read profiles. They created databases of 2,540 miRNA read profiles using short RNA-seq data from miRBase and 4,795 read profiles from ENCODE (after preprocessing). Of the 4,795 ENCODE profiles, 1,361 are annotated as noncoding RNAs (ncRNAs) and of which 285 are further annotated as miRNAs. Using \prog{deepBlockAlign} (dba), they align ENCODE ncRNA profiles against the miRBase profiles (cleaned for “self-matches”) and are able to separate ENCODE miRNAs from the other ncRNAs by a Matthews correlation coefficient of 0.8 and then obtain the area under the curve of 0.93. Using the derived separation dba score cut-off, they predict 523 novel miRNA candidates. Further analysis reveal that these are located in genomic regions with (UCSC) MAF block fragmentation and poor sequence conservation, which in part might explain why they have been overlooked in previous efforts.

The researchers further analyzed known miRNAs from human and mouse and found two distinct classes containing two block or $>2$ block respectively, where the latter class hold profiles having less well defined arrangement of reads. They further compared the read profiles specific for plant and animals respectively, in terms of both length and distribution of reads within the profiles. They observed that some read profiles were specific for the two kingdoms respectively.

Availability: All data as well as a server to search miRBase profiles by uploading a BED file is available at http://rth.dk/resources/dba/mirna.

  • Pundhir S, Gorodkin J. (2013) MicroRNA discovery by similarity search to a database of RNA-seq profiles. Frontiers in Bioinform & Comp Biol [Epub ahead of print]. [abstract]

Incoming search terms:

  • unstranded
  • www rna-seqblog com microrna-discovery-by-similarity-search-to-a-database-of-rna-seq-profiles
  • rna-seq blog encode
  • encode rna seq guidelines
  • rna seq blog mirna poll
  • Pundhir S Gorodkin J (2013) MicroRNA discovery by similarity search to a database of RNA-seq profiles Frontiers in Bioinform & Comp Biol [Epub ahead of print] [abstract]
  • database for rna seq results
  • rna-seq database bam
  • rna seq guidelines and practices encode
  • rna seq mirna tophat small rnas

from Genetic Engineering News by Richard A. Stein, M.D., Ph.D.

The complex and dynamic transcriptional patterns unveiled by the ENCyclopedia of DNA Elements (ENCODE) project, together with the finding that less than 2% of the transcriptional output of the human genome encodes proteins and approximately 98% encodes noncoding RNAs, are some of the advances that reshaped the field and even required that we revisit the definition of the gene.Cancer

While insights into the genome have repeatedly been a source of thought-provoking findings, the transcriptome, with its unprecedented and unexpected levels of complexity, promises to be even more intriguing. The emergence of RNA-Seq allowed quantitative and high-throughput analyses of the transcriptome to be performed in different cell types and under various conditions, and with the massive amounts of data that have been generated, computational analysis is emerging as one of the most critical challenges.

“Two of the basic problems in transcriptome analysis are identifying the true sets of transcripts in a given tissue at a given time, and defining the dynamics of gene expression,” says Zhong Wang, Ph.D., staff scientist and group lead for genome analysis at the DOE Joint Genome Institute. Read more

Incoming search terms:

  • integrated analysis of lncrna and mrna by gsea analysis
  • rna extraction in next gen sequencing powerpoint
  • encode rna seq standards
  • p53 responsive genes rna seq
  • RNA seq and HPV
  • rna-seq hpv virus
  • takeover rumours pacific biosciences of calif
  • transcriptomes cervical cancer
  • whole transcriptome RNA-seq cervical cancer

To align their large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, a team of researchers at Cold Spring Harbor Laboratory developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure.

  • Very high mapping speed:
    on a modest 12-core cluster STAR maps 400 Million pairs per hour for human 2×100 Illumina reads (>50 times faster than TopHat).
  • Accurate alignment of contiguous and spliced reads:
    in our tests on real and simulated data STAR showed better sensitivity and precision than TopHat.
  • Detection of polyA-tails, non-canonical splices and chimeric (fusion) junctions.
  • Mapping reads of any length:
    STAR can efficiently map reads of any length generated by current or emerging sequencing platforms, starting from ~15 bases (small RNA) and up to full length transcripts several kilobases long.
  • Thorough testing on large ENCODE datasets:
    STAR was used to map 64 Billion reads of long RNA-seq and 16 Billion reads of short RNA-seq, and will be used to map RNA-seq data in the next ENCODE phase.

STAR requires ~30GB of RAM for mapping to the human genome (could be reduced to 16GB in the “sparse” mode with some speed loss).

STARAvailability and implementation: STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.

Contact: dobin@cshl.edu

I will be happy to answer any questions via SEQanswers, STAR discussion forum

  • Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 29(1), 15-21. [article]

Incoming search terms:

  • star aligner cshl
  • rna sequencing recommendations
  • rnaseq sppike in controls
  • star rna-seq sparse
  • encode rna-seq protocol
  • encode rna-seq data
  • rna spike in control rna seq
  • star rnaseq alighner
  • star rnaseq mapping
  • spike-in standards for rna-seq

The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. Read more

Incoming search terms:

  • guide to analyzing rna-seq data - bioinfwiki - update 21-01-2011 pdf
  • rna-seq guidelines
  • encode project publications rna-seq
  • cufflinks gencode
  • gencode rna-seq reads

ThinkingThere is no question that RNA-Seq has several major advantages over current hybridization-based approach such as microarrays. However, with the cost per sample of RNA-Seq still much higher than microarray, it would be beneficial if multiple samples could be multiplexed and sequenced in a single lane with. But how many reads are enough for sufficient transcriptome coverage? Read more

Incoming search terms:

  • how many reads for rna-seq
  • RNA seq depth
  • rna-seq coverage depth
  • rna seq number of reads
  • how many reads rna-seq
  • rna seq read depth
  • rna-seq read depth
  • how many reads needed for rna-seq
  • rna-seq how many reads
  • rna-seq number of reads

There have been several studies demonstrating the biases inherent to the RNA-Seq method as well as variation in results across protocols and platforms. Researchers have set about innovating methods to correct for these biases and variances, but until now, most correction methods involve the use of bioinformatics models for partial correction. (See Post – Bias Detection and Correction in RNA-Sequencing Data)

Recently, researchers at the NIH, the NIST, and Cold Spring Harbor Lab have developed a synthetic spike-in standard as another tool for combating biases. The spike-in control consists of a pool of 96 synthetic RNAs with various lengths, and GC content covering a 220 concentration range as spike-in controls to measure sensitivity, accuracy, and biases in RNA-seq experiments as well as to derive standard curves for quantifying the abundance of transcripts.

Using data collected as part of the ENCODE and modENCODE projects, they demonstrate that external RNA controls are a useful resource for evaluating sensitivity and accuracy of RNA-seq experiments for transcriptome discovery and quantification. These quality metrics facilitate comparable analysis across different samples, protocols, and platforms.

Jiang L, Schlesinger F, Davis CA, Zhang Y, Li R, Salit M, Gingeras TR, Oliver B. (2011) Synthetic spike-in standards for RNA-seq experiments. Genome Res [Epub ahead of print]. [abstract]

Incoming search terms:

  • transcriptome
  • modencode treatment expression data
  • Get RNA Expression from modENCODE
  • mod encode stranded rna-seq
  • modencode normalized rna-seq
  • drosophila modencode tissue rpkm
  • modencode treatment expression data drosophila
  • rnaseq drosophila larvae
  • spike in differential expression rna-seq
  • tophat fasta synthetic spike in

EncodeThe ENCODE Consortium has finalized Standards, Guidelines and Best Practices for RNA-Seq V1.0

Download the document RNA-Seq Best Practices PDF

RNA-Seq is a directed experimental approach aimed at characterizing transcription in biological samples. This document presents a set of guidelines and standards focused on best practices for creating ‘reference quality’ transcriptome measurements. As technologies are rapidly evolving and the aims of RNA-Seq experiments are diverse, this document does not cover all standards and quality control issues.

Incoming search terms:

  • degradome sequencing
  • encode rna-seq guidelines
  • standards guidelines and best practices for rna-seq
  • RNA seq 100bp read
  • encode rna-seq best practices
  • rna-seq 75 or 100 bp
  • rnaseq single read 100bp sequencing
  • rna seq best practices
  • rnaseq 50bp vs 100bp
  • rnaseq 100bp sr

  • Social Networking Pages

    Linkedin Group

  • Follow Me on Pinterest
  • RSS SEQanswers – RNA Sequencing

    • HT Seq Count stranded options May 24, 2013
      I am very new to bioinformatics, so I would be really grateful for some help! I have been using *HTSeq Count v0.5.3* and I am bit confused about... […]
      qwrissie
    • Tophat 2.0.8b installation error May 24, 2013
      I install tophat-2.0.8b to rerun the mapping. but when i make it, the error appears like this. make[1]: Entering directory... […]
      canhu
    • reason for low mapping rate?? May 23, 2013
      we did RNASeq using HiSeq 2000 100PE. When the data were back, I mapping them to the reference sequence, but got very low mapping rate (30-40%). I... […]
      miaom
    • cross-species data - questions about normalization May 23, 2013
      Hi, I have some data form various samples (cell types) in different species. I want to compare and analyze gene expression variability across the... […]
      trelek2
    • CuffDiff strange output May 23, 2013
      Hi, I hope that someone can be so gentle to help me. I'm analizing some data from RNA-Seq with TopHat and Cufflinks and I focus my attention on... […]
      Pruexel
    • cannot away with cuffdiff,incredible May 23, 2013
      Hi,all I have 4(A,B,C,D) sample in 4 times(increasing time),I got diff result in 3 different cuffdiff 1.cuffdiff 3(A,B,C) individual... […]
      upper
  • RSS Biostar – RNA-Seq

    • Why am I getting so many unmapped reads in STAR, classified as "too short"?
      I am currently using STAR to map several Hi-SEQ mRNA runs. I'm having trouble getting a decent amount of reads to map, but I don't really understand why. I'm hoping you can shed some light :) In the final log, only about 50% (or less) of the reads map to the reference. I'm using a GTF in addition to the genome. The unmapped bin that most […]
    • What are the best practices for SNP identification in RNA seq transcriptome data
      I have 20 RICE RNA seq tranascriptome data hiseq 2000 platform paired end reads. I aligned fasta reads with BWA and remove PCR duplicates with PICARD. Later I call SNP with samtools using various parameters. I would like to clarify what parameters should I used while alinging to reference rice genome for looking SNP location 100 bp upstream and 250 bp downst […]
    • How do TopHat options -g , --supress-hits, and Bowtie options interplay?
      Hi, I am currently using TopHat2 to map RNA-seq runs. I think there have been some changes pertaining the -g option. Does anyone know how it works now? I used to think that setting -g would look for n alignments for a given read, report them [if top-scoring] and discard those reads that had more than g [top scoring] alignments. Now, the description sounds mo […]
    • What happened to -k in TopHat for multiple-mapping reads?
      Selecting -g n in tophat does not discard reads mapping more than n, but instead only reports n alignments for those out all all their TOP scoring alignments. I think there used to be an option -k that would allow one to discard reads that topped x alignments -- whatever happened to that? I only see -g in the tophat 2 manual, no reporting options like before […]
    • Does tophat use the library-type information for mapping, or just for the XS flag?
      When I specify library-type to TopHat, i.e., first-strand, second-strand, unstranded, TopHat appends a value + or - to the XS:A flag, which is useful for subsequent analyses, such as annotation. However, does this information actually influence the "mappability" of reads, or is this unaffected? My thinking is that the information would be considere […]
    • Purpose of Y-shaped adapters in Illumina Sequencing?
      Hi all, Y adapters different sequences to be annealed to the 5' and 3' ends of each molecule in a library. The arms of the Y are unique, and the middle part, connected to the DNA fragment, is complementary. What are the advantages of this? My take of this over having fully-complementary adapters (ADAPTER1 - - - - - ADAPTER1) is that: -Upon primer a […]