Apr
2
MicroRNA discovery by similarity search to a database of RNA-seq profiles
Filed Under Databases, Other Tools | Leave a Comment
In silico generated search for microRNAs (miRNAs) have been driven by methods compiling structural features of the miRNA precursor hairpin as well as to some degree combining this with analysis of RNA-seq profiles for which the miRNA typically leave the drosha/dicer fingerprint of 1-2 ~22nt blocks of reads corresponding to the mature and star miRNA.
In complement to the previous methods, researchers at the University of Copenhagen, Denmark present a study where they systematically exploit these pattern of read profiles. They created databases of 2,540 miRNA read profiles using short RNA-seq data from miRBase and 4,795 read profiles from ENCODE (after preprocessing). Of the 4,795 ENCODE profiles, 1,361 are annotated as noncoding RNAs (ncRNAs) and of which 285 are further annotated as miRNAs. Using \prog{deepBlockAlign} (dba), they align ENCODE ncRNA profiles against the miRBase profiles (cleaned for “self-matches”) and are able to separate ENCODE miRNAs from the other ncRNAs by a Matthews correlation coefficient of 0.8 and then obtain the area under the curve of 0.93. Using the derived separation dba score cut-off, they predict 523 novel miRNA candidates. Further analysis reveal that these are located in genomic regions with (UCSC) MAF block fragmentation and poor sequence conservation, which in part might explain why they have been overlooked in previous efforts.
The researchers further analyzed known miRNAs from human and mouse and found two distinct classes containing two block or $>2$ block respectively, where the latter class hold profiles having less well defined arrangement of reads. They further compared the read profiles specific for plant and animals respectively, in terms of both length and distribution of reads within the profiles. They observed that some read profiles were specific for the two kingdoms respectively.
Availability: All data as well as a server to search miRBase profiles by uploading a BED file is available at http://rth.dk/resources/dba/mirna.
- Pundhir S, Gorodkin J. (2013) MicroRNA discovery by similarity search to a database of RNA-seq profiles. Frontiers in Bioinform & Comp Biol [Epub ahead of print]. [abstract]
Incoming search terms:
- unstranded
- www rna-seqblog com microrna-discovery-by-similarity-search-to-a-database-of-rna-seq-profiles
- rna-seq blog encode
- encode rna seq guidelines
- rna seq blog mirna poll
- Pundhir S Gorodkin J (2013) MicroRNA discovery by similarity search to a database of RNA-seq profiles Frontiers in Bioinform & Comp Biol [Epub ahead of print] [abstract]
- database for rna seq results
- rna-seq database bam
- rna seq guidelines and practices encode
- rna seq mirna tophat small rnas
Apr
1
Querying the Cancer Transcriptome
Filed Under News, Publications, Review | Leave a Comment
from Genetic Engineering News by Richard A. Stein, M.D., Ph.D.
The complex and dynamic transcriptional patterns unveiled by the ENCyclopedia of DNA Elements (ENCODE) project, together with the finding that less than 2% of the transcriptional output of the human genome encodes proteins and approximately 98% encodes noncoding RNAs, are some of the advances that reshaped the field and even required that we revisit the definition of the gene.
While insights into the genome have repeatedly been a source of thought-provoking findings, the transcriptome, with its unprecedented and unexpected levels of complexity, promises to be even more intriguing. The emergence of RNA-Seq allowed quantitative and high-throughput analyses of the transcriptome to be performed in different cell types and under various conditions, and with the massive amounts of data that have been generated, computational analysis is emerging as one of the most critical challenges.
“Two of the basic problems in transcriptome analysis are identifying the true sets of transcripts in a given tissue at a given time, and defining the dynamics of gene expression,” says Zhong Wang, Ph.D., staff scientist and group lead for genome analysis at the DOE Joint Genome Institute. Read more
Incoming search terms:
- integrated analysis of lncrna and mrna by gsea analysis
- rna extraction in next gen sequencing powerpoint
- encode rna seq standards
- p53 responsive genes rna seq
- RNA seq and HPV
- rna-seq hpv virus
- takeover rumours pacific biosciences of calif
- transcriptomes cervical cancer
- whole transcriptome RNA-seq cervical cancer
Feb
18
STAR: ultrafast universal RNA-seq aligner
Filed Under Splicing and Junction Mapping | Leave a Comment
To align their large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, a team of researchers at Cold Spring Harbor Laboratory developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure.
- Very high mapping speed:
on a modest 12-core cluster STAR maps 400 Million pairs per hour for human 2×100 Illumina reads (>50 times faster than TopHat). - Accurate alignment of contiguous and spliced reads:
in our tests on real and simulated data STAR showed better sensitivity and precision than TopHat. - Detection of polyA-tails, non-canonical splices and chimeric (fusion) junctions.
- Mapping reads of any length:
STAR can efficiently map reads of any length generated by current or emerging sequencing platforms, starting from ~15 bases (small RNA) and up to full length transcripts several kilobases long. - Thorough testing on large ENCODE datasets:
STAR was used to map 64 Billion reads of long RNA-seq and 16 Billion reads of short RNA-seq, and will be used to map RNA-seq data in the next ENCODE phase.
STAR requires ~30GB of RAM for mapping to the human genome (could be reduced to 16GB in the “sparse” mode with some speed loss).
Availability and implementation: STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.
Contact: dobin@cshl.edu
I will be happy to answer any questions via SEQanswers, STAR discussion forum
- Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 29(1), 15-21. [article]
Incoming search terms:
- star aligner cshl
- rna sequencing recommendations
- rnaseq sppike in controls
- star rna-seq sparse
- encode rna-seq protocol
- encode rna-seq data
- rna spike in control rna seq
- star rnaseq alighner
- star rnaseq mapping
- spike-in standards for rna-seq
Sep
11
RNA-Seq making significant contributions to GENCODE project
Filed Under Publications | Leave a Comment
The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. Read more
Incoming search terms:
- guide to analyzing rna-seq data - bioinfwiki - update 21-01-2011 pdf
- rna-seq guidelines
- encode project publications rna-seq
- cufflinks gencode
- gencode rna-seq reads
Oct
20
How Many Reads are Enough?
Filed Under Information | 1 Comment
There is no question that RNA-Seq has several major advantages over current hybridization-based approach such as microarrays. However, with the cost per sample of RNA-Seq still much higher than microarray, it would be beneficial if multiple samples could be multiplexed and sequenced in a single lane with. But how many reads are enough for sufficient transcriptome coverage? Read more
Incoming search terms:
- how many reads for rna-seq
- RNA seq depth
- rna-seq coverage depth
- rna seq number of reads
- how many reads rna-seq
- rna seq read depth
- rna-seq read depth
- how many reads needed for rna-seq
- rna-seq how many reads
- rna-seq number of reads
Aug
12
There have been several studies demonstrating the biases inherent to the RNA-Seq method as well as variation in results across protocols and platforms. Researchers have set about innovating methods to correct for these biases and variances, but until now, most correction methods involve the use of bioinformatics models for partial correction. (See Post – Bias Detection and Correction in RNA-Sequencing Data)
Recently, researchers at the NIH, the NIST, and Cold Spring Harbor Lab have developed a synthetic spike-in standard as another tool for combating biases. The spike-in control consists of a pool of 96 synthetic RNAs with various lengths, and GC content covering a 220 concentration range as spike-in controls to measure sensitivity, accuracy, and biases in RNA-seq experiments as well as to derive standard curves for quantifying the abundance of transcripts.
Using data collected as part of the ENCODE and modENCODE projects, they demonstrate that external RNA controls are a useful resource for evaluating sensitivity and accuracy of RNA-seq experiments for transcriptome discovery and quantification. These quality metrics facilitate comparable analysis across different samples, protocols, and platforms.
Jiang L, Schlesinger F, Davis CA, Zhang Y, Li R, Salit M, Gingeras TR, Oliver B. (2011) Synthetic spike-in standards for RNA-seq experiments. Genome Res [Epub ahead of print]. [abstract]
Incoming search terms:
- transcriptome
- modencode treatment expression data
- Get RNA Expression from modENCODE
- mod encode stranded rna-seq
- modencode normalized rna-seq
- drosophila modencode tissue rpkm
- modencode treatment expression data drosophila
- rnaseq drosophila larvae
- spike in differential expression rna-seq
- tophat fasta synthetic spike in
Jul
5
Best Practices for RNA-Seq
Filed Under Information | 1 Comment
The ENCODE Consortium has finalized Standards, Guidelines and Best Practices for RNA-Seq V1.0
RNA-Seq is a directed experimental approach aimed at characterizing transcription in biological samples. This document presents a set of guidelines and standards focused on best practices for creating ‘reference quality’ transcriptome measurements. As technologies are rapidly evolving and the aims of RNA-Seq experiments are diverse, this document does not cover all standards and quality control issues.
Incoming search terms:
- degradome sequencing
- encode rna-seq guidelines
- standards guidelines and best practices for rna-seq
- RNA seq 100bp read
- encode rna-seq best practices
- rna-seq 75 or 100 bp
- rnaseq single read 100bp sequencing
- rna seq best practices
- rnaseq 50bp vs 100bp
- rnaseq 100bp sr


.png)







