In silico generated search for microRNAs (miRNAs) have been driven by methods compiling structural features of the miRNA precursor hairpin as well as to some degree combining this with analysis of RNA-seq profiles for which the miRNA typically leave the drosha/dicer fingerprint of 1-2 ~22nt blocks of reads corresponding to the mature and star miRNA.

In complement to the previous methods, researchers at the University of Copenhagen, Denmark present a study where they systematically exploit these pattern of read profiles. They created databases of 2,540 miRNA read profiles using short RNA-seq data from miRBase and 4,795 read profiles from ENCODE (after preprocessing). Of the 4,795 ENCODE profiles, 1,361 are annotated as noncoding RNAs (ncRNAs) and of which 285 are further annotated as miRNAs. Using \prog{deepBlockAlign} (dba), they align ENCODE ncRNA profiles against the miRBase profiles (cleaned for “self-matches”) and are able to separate ENCODE miRNAs from the other ncRNAs by a Matthews correlation coefficient of 0.8 and then obtain the area under the curve of 0.93. Using the derived separation dba score cut-off, they predict 523 novel miRNA candidates. Further analysis reveal that these are located in genomic regions with (UCSC) MAF block fragmentation and poor sequence conservation, which in part might explain why they have been overlooked in previous efforts.

The researchers further analyzed known miRNAs from human and mouse and found two distinct classes containing two block or $>2$ block respectively, where the latter class hold profiles having less well defined arrangement of reads. They further compared the read profiles specific for plant and animals respectively, in terms of both length and distribution of reads within the profiles. They observed that some read profiles were specific for the two kingdoms respectively.

Availability: All data as well as a server to search miRBase profiles by uploading a BED file is available at http://rth.dk/resources/dba/mirna.

  • Pundhir S, Gorodkin J. (2013) MicroRNA discovery by similarity search to a database of RNA-seq profiles. Frontiers in Bioinform & Comp Biol [Epub ahead of print]. [abstract]

Incoming search terms:

  • pundhir gorodkin frontiers
  • unstranded
  • www rna-seqblog com microrna-discovery-by-similarity-search-to-a-database-of-rna-seq-profiles
  • rna-seq blog encode
  • Pundhir S Gorodkin J (2013) MicroRNA discovery by similarity search to a database of RNA-seq profiles Frontiers in Bioinform & Comp Biol [Epub ahead of print] [abstract]
  • rna seq blog mirna poll
  • rna-seq database bam
  • encode rna seq guidelines
  • rnaseq encode
  • database for rna seq results

The purpose of the system is to automate and simplify as much as possible the process of analyzing RNA-Seq results data by storing it in a database and providing many options for querying it.

Goals

  • Goal 1: Provide a system through which Biologists can analyze their RNA-Seq results data, specifically differential expression tests, novel transcript discoveries, and assembled transcripts.
  • Goal 2: The system should allow the user to get meaningful results with minimal learning time. For this goal to be satisfied, two senior Biologists familiar with RNA-seq must approve the system.

Availability – The Queryable RNA Seq Database is available online at: https://github.com/fatPerlHacker/queryable-rna-seq-database

Incoming search terms:

  • Queryable RNA-Seq Database
  • www rna-seqblog com the-queryable-rna-seq-database

The Total Integrated Archive of short-Read and Array (TIARA) database stores and integrates human genome data generated from multiple technologies including next-generation sequencing and high-resolution comparative genomic hybridization array. The TIARA genome browser is a powerful tool for the analysis of personal genomic information by exploring genomic variants such as SNPs, indels and structural variants simultaneously. As of September 2012, the TIARA database provides raw data and variant information for 13 sequenced whole genomes, 16 sequenced transcriptomes and 33 high resolution array assays. Sequencing reads are available at a depth of ∼30× for whole genomes and 50× for transcriptomes. Information on genomic variants includes a total of ∼9.56 million SNPs, 23 025 of which are non-synonymous SNPs, and ∼1.19 million indels. In this update, by adding high coverage sequencing of additional human individuals, the TIARA genome database now provides an extensive record of rare variants in humans. Following TIARA’s fundamentally integrative approach, new transcriptome sequencing data are matched with whole-genome sequencing data in the genome browser. Users can here observe, for example, the expression levels of human genes with allele-specific quantification. Improvements to the TIARA genome browser include the intuitive display of new complex and large-scale data sets.

TIARA genome database

Availability: TIARA database is available online at – http://tiara.gmi.ac.kr

  • Hong D, Lee J, Bleazard T, Jung H, Ju YS, Yu SB, Kim S, Park SS, Kim JI, Seo JS. (2013) TIARA genome database: update 2013. Database (Oxford) [Epub ahead of print]. [article]

Incoming search terms:

  • beta binomial trac nbic
  • Integrated genome database
  • rna-seq software database
  • nextera rna seq
  • tiara seq
  • trinity 20in 2013 torrent
  • understanding rna sequencing results
  • www rna-seqblog com tiara-genome-database-update-2013

Next generation sequencing is rapidly becoming the approach of choice for transcriptional analysis experiments. Substantial advances have been achieved in computational approaches to support these technologies. These approaches typically rely on existing transcript annotations, introducing a bias towards known genes, require specific experimental design and computational resources, or focus only on identification of splice variants (ignoring other biologically relevant transcribed features contained within the data that may be important for downstream analysis). Biologically relevant transcribed features also include large and small non-coding RNA, new transcription start sites, alternative promoters, RNA editing and processing of coding transcripts. Also, many existing solutions lack accessible interfaces required for wide scale adoption.

Researchers at the Monash Institute of Medical Research, Monash University, Australia have developed a user-friendly, rapid and computation-efficient feature annotation framework (RNA-eXpress) that enables identification of transcripts and other genomic and transcriptional features independently of current annotations. RNA-eXpress accepts mapped reads in the standard binary alignment (BAM) format and produces a study-specific feature annotation in GTF format, comparison statistics, sequence extraction and feature counts. The framework is designed to be easily accessible while allowing advanced users to integrate new feature-identification algorithms through simple class extension, thus facilitating expansion to novel feature types or identification of study specific feature types.

RNA-eXpress

Availability and Implementation: RNA-eXpress software, source code, user manuals, supporting tutorials, developer guides and example data are available at http://www.rnaexpress.org.

Contact: paul.hertzog@monash.edu

  • Forster S, Finkel A, Gould J, Hertzog P. (2013) RNA-eXpress annotates novel transcript features in RNA-seq data Bioinformatics [Epub ahead of print]. [abstract]

Incoming search terms:

  • eXpress bioconductor RNA reads

Nucleic Acids ResearchThe 20th annual Database Issue of Nucleic Acids Research includes 176 articles, half of which describe new online molecular biology databases and the other half provide updates on the databases previously featured in NAR and other journals. This year’s highlights include two databases of DNA repeat elements; several databases of transcriptional factors and transcriptional factor-binding sites; databases on various aspects of protein structure and protein-protein interactions; databases for metagenomic and rRNA sequence analysis; and four databases specifically dedicated to Escherichia coli. The increased emphasis on using the genome data to improve human health is reflected in the development of the databases of genomic structural variation (NCBI’s dbVar and EBI’s DGVa), the NIH Genetic Testing Registry and several other databases centered on the genetic basis of human disease, potential drugs, their targets and the mechanisms of protein-ligand binding. Two new databases present genomic and RNAseq data for monkeys, providing wealth of data on our closest relatives for comparative genomics purposes. The NAR online Molecular Biology Database Collection has been updated and currently lists 1512 online databases.

The NAR online Molecular Biology Database Collection is available at http://www.oxfordjournals.org/nar/database/cap/.

The full content of the Database Issue is freely available online on the Nucleic Acids Research website: http://nar.oxfordjournals.org/.

  • Fernández-Suárez XM, Galperin MY. (2013) The 2013 Nucleic Acids Research Database Issue and the online molecular biology database collection. Nucleic Acids Res 41(Database issue):D1-7. [article]
Nucleotide Sequence Databases
RNA sequence databases
Protein sequence databases
Structure Databases
Genomics Databases (non-vertebrate)
Metabolic and Signaling Pathways
Human and other Vertebrate Genomes
Human Genes and Diseases
Microarray Data and other Gene Expression Databases
Proteomics Resources
Other Molecular Biology Databases
Organelle databases
Plant databases
Immunological databases
Cell biology

Incoming search terms:

  • RNA-seq pictures
  • RNA seq pdf
  • meta rnaseq
  • RNASeq RNA extraction small RNA method
  • rna seq workshop trinity
  • rna-seq analysis differential express protocol
  • rna-seq and tophat versus cufflinks
  • rnaseq miseq liver
  • rnaseq or rna seq
  • SplicingCompass differential splicing detection using RNA-Seq data PDF download

Biogenesis and molecular function are two key subjects in the field of microRNA (miRNA) research. Deep sequencing has become the principal technique in cataloging of miRNA repertoire and generating expression profiles in an unbiased manner.

miRGator

A team led by researchers at Ewha Womans University, Korea have updated miRGator to version v3.0. miRGator compiles the deep sequencing miRNA data available in public and the team has implemented several novel tools to facilitate exploration of massive data. The miR-seq browser supports users to examine short read alignment with the secondary structure and read count information available in concurrent windows. Features such as sequence editing, sorting, ordering, import and export of user data would be of great utility for studying iso-miRs, miRNA editing and modifications. miRNA-target relation is essential for understanding miRNA function. Coexpression analysis of miRNA and target mRNAs, based on miRNA-Seq and RNA-Seq data from the same sample, is visualized in the heat-map and network views where users can investigate the inverse correlation of gene expression and target relations, compiled from various databases of predicted and validated targets. By keeping datasets and analytic tools up-to-date, miRGator should continue to serve as an integrated resource for biogenesis and functional investigation of miRNAs.

Availability – miRGator v3.0 update is available at: http://mirgator.kobic.re.kr

Cho S, Jang I, Jun Y, Yoon S, Ko M, Kwon Y, Choi I, Jang H, Ryu D, Lee B, Kim VN, Kim W, Lee S. (2012) miRGator v3.0: a microRNA portal for deep sequencing, expression profiling and mRNA targeting. Nucleic Acids Res [Epub ahead of print]. [article]

Incoming search terms:

  • RNA-Seq and microRNA expression profiling reveal networks of RNA interactions in regenerating dorsal root ganglion neurons
  • microRNA blog
  • Error running long_spanning_reads:Loading fusions
  • mirgator trouble
  • splice aware aligner for prokaryotic

RhesusAlthough the rhesus macaque is a unique model for the translational study of human diseases, currently its use in biomedical research is still in its infant stage due to error-prone gene structures and limited annotations. Here, we present RhesusBase for the monkey research community (http://www.rhesusbase.org). Read more

Incoming search terms:

  • RHESUS MONKEY
  • rhesus
  • rna-seq blog rhesus

What are the RNA-Seq models in Ensembl, and how were they determined? How does RNA-Seq data contribute to Ensembl gene sets? Can I upload my own RNA-Seq data to Ensembl? Answers to these questions and more…

PeaPea (Pisum sativum L.), with its high protein seeds and its ability to establish a symbiosis with soil nitrogen fixing bacteria, is a strategic crop in temperate regions. Moreover, pea is a long-standing model in genetics and physiology. This web-portal provides the first full-length Unigene set expression atlas for pea. Twenty pea cDNA libraries were prepared from different above- and below- ground cv “Cameor” plant organs, at different stages, and for different nutrition conditions. Libraries were sequenced using Next-Generation Sequencing technologies. Sequences were assembled de novo and a full-length Unigene set was produced. The sequencing depth of each cDNA contig relates to the expression level of transcripts. This gene atlas presents the pattern of expression and thus provides useful functional information for each cDNA contig. In the future, new RNA-Seq experiments will be added to this portal to enlarge the atlas’ scope.

The Pea RNA-Seq Gene Atlas is available at: http://bios.dijon.inra.fr/FATAL/cgi/pscam.cgi

Full-length de novo assembly of new pea RNA-seq data reveals the complexity of the pea transcriptome, S. Alves-Carvalho et al. in prep.

Incoming search terms:

  • md anderson rnaseq fusion gene

from the miRBase Blog – By Sam

miRBase 19 is now available, brought to you from the Benasque RNA meeting in the sunny Pyrenees, and with a slightly larger time gap than usual. In that extended time, we have added more than the usual number of new sequences — 3171 new hairpins and 3625 novel mature products, bringing the totals to 21264 and 25141 respectively in 193 species. As always, the full README file is available on the FTP site, along with downloadable files containing all data in various formats. Read more

Incoming search terms:

  • mirbase statistics
  • how many human mrna sequence mirbase release 19
  • mirbase 19
  • mirbase gtf tophat
  • rat mirbase bed

Ensembl gene annotation provides a comprehensive catalogue of transcripts aligned to the reference sequence. It relies on publicly available species specific and orthologous transcripts plus their inferred protein sequence. The accuracy of gene models is improved by increasing the species specific component which can be cost-effectively achieved using RNA-Seq. Two zebrafish gene annotations are presented in Ensembl version 62 built on the Zv9 reference sequence.

Firstly, RNA-Seq data from five tissues and seven developmental stages were assembled into 25,748 gene models. A 3′ end capture and sequencing protocol was developed to predict the 3′ ends of transcripts and 46.1% of the original models were subsequently refined. Read more

Incoming search terms:

  • helicos
  • zebrafish rna-seq
  • rna seq zebrafish
  • Ensembl RNA-Seq gene model

mirfansmiRFANs, an online database for Arabidopsis thaliana miRNA function annotations. The creators integrated various type of datasets, including miRNA-target interactions, transcription factor (TF) and their targets, expression profiles, genomic annotations and pathways, into a comprehensive database, and developed various statistical and mining tools.

miRFANs consists of:

  1. Comprehensive collection of miRNA targets for Arabidopsis thaliana provides valuable information about the functions of plant miRNAs.
  2. Highly informative miRNA-mediated genetic regulatory network is extracted from our integrative database.
  3. Set of statistical and mining tools is equipped for analyzing and mining the database.
  4. User-friendly web interface is developed to facilitate the browsing and analysis of the collected data.

miRFANs is freely available at: http://www.cassava-genome.cn/mirfans

  • Liu H, Jin T, Liao R, Wan L, Xu B, Zhou S, Guan J. (2012) miRFANs: an integrated database for Arabidopsis thaliana microRNA function annotations. BMC Plant Biology [Epub ahead of print]. [abstract]

Incoming search terms:

  • miRFANs
  • mirna seq
  • mirfan
  • arabidopsis mirna database
  • arabidopsis miRNA target
  • mirna target arabidopsis
  • Mirfan com
  • mir fan com
  • arabidopsis mirna target database
  • miRFANs: an integrated database for Arabidopsis thaliana microRNA function annotations

VESPA

VESPA is a desktop JavaTM application that integrates high-throughput proteomics data (peptide-centric) and transcriptomics (probe or RNA-Seq) data into a genomic context, all of which can be visualized at three levels of genomic resolution. Data is interrogated via searches linked to the genome visualizations to find regions with high likelihood of mis-annotation. Search results are linked to exports for further validation outside of VESPA or potential coding-regions can be analyzed concurrently with the software through interaction with BLAST.

VESPA is demonstrated on two use cases (Yersinia pestis Pestoides F and Synechococcus sp. PCC 7002) to demonstrate the rapid manner in which mis-annotations can be found and explored in VESPA using either proteomics data alone, or in combination with transcriptomic data.

The software is freely available at https://www.biopilot.org/docs/Software/Vespa.php

  • Peterson ES, McCue LA, Schrimpe-Rutledge AC, Jensen JL, Walker H, Kobold MA, Webb SR, Payne SH, Ansong CK, Adkins JN, Cannon WR, Webb-Robertson BJ. (2012) VESPA: software to facilitate genomic annotation of prokaryotic organisms through integration of proteomic and transcriptomic data. BMC Genomics 13(1), 131. [article]

Incoming search terms:

  • does VESPA do statistics
  • vesparna

Next Page →

  • Social Networking Pages

    Linkedin Group

  • Follow Me on Pinterest
  • RSS SEQanswers – RNA Sequencing

    • RNAseq (SOLiD) from 18 - 200 nt June 18, 2013
      We are interested in small non-coding RNAs. Whomever you ask about the size range of small RNAs, you get a different answer. ;) Lets assume, small... […]
      GenomicIBK
    • Unmapped ratio very high on mouse genome June 17, 2013
      Hi, My problem regards RNA-Seq data. I've downloaded public data (SAGE libs w/ 6 different samples from mouse liver ) to analyse using ArrayStudio.... […]
      le.nono
    • RNASeq: Read length different from expected June 17, 2013
      Hello all, I have received paired-end reads for 40 samples. The reads are supposed to be 100bp per end. Instead, 20 of my samples are 101bp per... […]
      gogodidi
    • How to install xgawk June 16, 2013
      Hi, This is Shrujan, i have a problem while running RNA Sequencing QC. It shows an error that xgawk is not found. So please help me installing... […]
      shrujan
    • RNA Sequencing QC Error while using with Sequence_QC.sh file June 15, 2013
      Hi, This is Shrujan kumar Madadha, I had an error while running QC for Drosophila Yukuba fastq RNA file using Sequence_QC.sh file of FASTX... […]
      shrujan
    • Cuffmerge related query June 12, 2013
      I have a query regarding what samples should be merged using cuffmerge, when you have multiple phenotypes (each with replicates). Lets say my mouse... […]
      ParthavJailwala
  • RSS Biostar – RNA-Seq

    • edgeR: very low p-value and very high variance within the group of replicates. What's my problem??
      I'm using edgeR in order to perform differential expression analysis from RNA-seq experiment. I have 6 samples of tumor cell, same tumor and same treatment: 3 patient with good prognosis and 3 patient with bad prognosis. I want to compare the gene expression among the two groups. I ran the edgeR pakage like follow: x […]
    • Normalising tag count to RPKM
      Hi! I was wondering if their is a way to normalise the number of reads in a region and the RPKM of the nearest gene to that region, so that a correlation could be computed. Like the following data shows number of tags in first column and RPKM in second column Tags RPKM 15 0.14619 11 0 203 0.2259 129 10.701 300 7.0772 122 2.3234 346 10.666 77 3.117 201 16.749 […]
    • a simple question on RNA-Seq terminology
      This question may be very simple and basic, but I just need to confirm that I understand the differences among those terminologies in the RNA-Seq context. Suppose I have a sample called SLR, and it is sequenced on 5 lanes, so I have (among other output files) BAM files like L1_SLR, L2_SLR, L3_SLR, L5_SLR and L7_SLR.bam. Here, the letter "L" denotes […]
    • FInding regions of interest with minimum coverage
      Hi, I have a bam file of all my accepted hits (tophat output) and an gtf file with my genes of interest for which I am trying to find potential antisense transcripts. I would like to create a list - preferably one that can be visualized in a genome browser - that shows all genes that have antisense reads in the accepted hits.bam file provided that there are […]
    • How to remove the intronic reads before counting
      I got RNASeq data in several samples. I checked the FastQC, seems the read quality are good (Hiseq 2000). But the problem is many reads are mapped to intronic region, and the regions have no any reference exons there (Refseq, ensembl, gencode). We don't know what they are. We guess the problem happend in library preparation, the concentration was low. N […]
    • Which strand of the mRNA molecule does the sequencer output as a "read"?
      In Illumina Stranded RNA-Seq (using the dUTP method), do the final reads in the fastq files correspond to the initial molecule (that was transcribed), or to the reverse complement of the molecule? C […]