In silico generated search for microRNAs (miRNAs) have been driven by methods compiling structural features of the miRNA precursor hairpin as well as to some degree combining this with analysis of RNA-seq profiles for which the miRNA typically leave the drosha/dicer fingerprint of 1-2 ~22nt blocks of reads corresponding to the mature and star miRNA.

In complement to the previous methods, researchers at the University of Copenhagen, Denmark present a study where they systematically exploit these pattern of read profiles. They created databases of 2,540 miRNA read profiles using short RNA-seq data from miRBase and 4,795 read profiles from ENCODE (after preprocessing). Of the 4,795 ENCODE profiles, 1,361 are annotated as noncoding RNAs (ncRNAs) and of which 285 are further annotated as miRNAs. Using \prog{deepBlockAlign} (dba), they align ENCODE ncRNA profiles against the miRBase profiles (cleaned for “self-matches”) and are able to separate ENCODE miRNAs from the other ncRNAs by a Matthews correlation coefficient of 0.8 and then obtain the area under the curve of 0.93. Using the derived separation dba score cut-off, they predict 523 novel miRNA candidates. Further analysis reveal that these are located in genomic regions with (UCSC) MAF block fragmentation and poor sequence conservation, which in part might explain why they have been overlooked in previous efforts.

The researchers further analyzed known miRNAs from human and mouse and found two distinct classes containing two block or $>2$ block respectively, where the latter class hold profiles having less well defined arrangement of reads. They further compared the read profiles specific for plant and animals respectively, in terms of both length and distribution of reads within the profiles. They observed that some read profiles were specific for the two kingdoms respectively.

Availability: All data as well as a server to search miRBase profiles by uploading a BED file is available at http://rth.dk/resources/dba/mirna.

  • Pundhir S, Gorodkin J. (2013) MicroRNA discovery by similarity search to a database of RNA-seq profiles. Frontiers in Bioinform & Comp Biol [Epub ahead of print]. [abstract]

Incoming search terms:

  • www rna-seqblog com microrna-discovery-by-similarity-search-to-a-database-of-rna-seq-profiles
  • rna-seq blog encode
  • encode rna seq guidelines
  • rna-seq database bam
  • rna seq blog mirna poll
  • database for rna seq results
  • Pundhir S Gorodkin J (2013) MicroRNA discovery by similarity search to a database of RNA-seq profiles Frontiers in Bioinform & Comp Biol [Epub ahead of print] [abstract]
  • rna seq mirna tophat small rnas
  • practise data set rna-seq
  • rna seq guidelines and practices encode

The purpose of the system is to automate and simplify as much as possible the process of analyzing RNA-Seq results data by storing it in a database and providing many options for querying it.

Goals

  • Goal 1: Provide a system through which Biologists can analyze their RNA-Seq results data, specifically differential expression tests, novel transcript discoveries, and assembled transcripts.
  • Goal 2: The system should allow the user to get meaningful results with minimal learning time. For this goal to be satisfied, two senior Biologists familiar with RNA-seq must approve the system.

Availability – The Queryable RNA Seq Database is available online at: https://github.com/fatPerlHacker/queryable-rna-seq-database

Incoming search terms:

  • Queryable RNA-Seq Database
  • www rna-seqblog com the-queryable-rna-seq-database

The Total Integrated Archive of short-Read and Array (TIARA) database stores and integrates human genome data generated from multiple technologies including next-generation sequencing and high-resolution comparative genomic hybridization array. The TIARA genome browser is a powerful tool for the analysis of personal genomic information by exploring genomic variants such as SNPs, indels and structural variants simultaneously. As of September 2012, the TIARA database provides raw data and variant information for 13 sequenced whole genomes, 16 sequenced transcriptomes and 33 high resolution array assays. Sequencing reads are available at a depth of ∼30× for whole genomes and 50× for transcriptomes. Information on genomic variants includes a total of ∼9.56 million SNPs, 23 025 of which are non-synonymous SNPs, and ∼1.19 million indels. In this update, by adding high coverage sequencing of additional human individuals, the TIARA genome database now provides an extensive record of rare variants in humans. Following TIARA’s fundamentally integrative approach, new transcriptome sequencing data are matched with whole-genome sequencing data in the genome browser. Users can here observe, for example, the expression levels of human genes with allele-specific quantification. Improvements to the TIARA genome browser include the intuitive display of new complex and large-scale data sets.

TIARA genome database

Availability: TIARA database is available online at – http://tiara.gmi.ac.kr

  • Hong D, Lee J, Bleazard T, Jung H, Ju YS, Yu SB, Kim S, Park SS, Kim JI, Seo JS. (2013) TIARA genome database: update 2013. Database (Oxford) [Epub ahead of print]. [article]

Incoming search terms:

  • beta binomial trac nbic
  • Integrated genome database
  • rna-seq software database
  • nextera rna seq
  • tiara seq
  • trinity 20in 2013 torrent
  • understanding rna sequencing results
  • www rna-seqblog com tiara-genome-database-update-2013

Next generation sequencing is rapidly becoming the approach of choice for transcriptional analysis experiments. Substantial advances have been achieved in computational approaches to support these technologies. These approaches typically rely on existing transcript annotations, introducing a bias towards known genes, require specific experimental design and computational resources, or focus only on identification of splice variants (ignoring other biologically relevant transcribed features contained within the data that may be important for downstream analysis). Biologically relevant transcribed features also include large and small non-coding RNA, new transcription start sites, alternative promoters, RNA editing and processing of coding transcripts. Also, many existing solutions lack accessible interfaces required for wide scale adoption.

Researchers at the Monash Institute of Medical Research, Monash University, Australia have developed a user-friendly, rapid and computation-efficient feature annotation framework (RNA-eXpress) that enables identification of transcripts and other genomic and transcriptional features independently of current annotations. RNA-eXpress accepts mapped reads in the standard binary alignment (BAM) format and produces a study-specific feature annotation in GTF format, comparison statistics, sequence extraction and feature counts. The framework is designed to be easily accessible while allowing advanced users to integrate new feature-identification algorithms through simple class extension, thus facilitating expansion to novel feature types or identification of study specific feature types.

RNA-eXpress

Availability and Implementation: RNA-eXpress software, source code, user manuals, supporting tutorials, developer guides and example data are available at http://www.rnaexpress.org.

Contact: paul.hertzog@monash.edu

  • Forster S, Finkel A, Gould J, Hertzog P. (2013) RNA-eXpress annotates novel transcript features in RNA-seq data Bioinformatics [Epub ahead of print]. [abstract]

Incoming search terms:

  • novel transcripts
  • rna express transcriptome assembly
  • tophat unmapped bam noval transcripts
  • eXpress bioconductor RNA reads
  • find novel transcripts as existing genes in other
  • galaxy genome express
  • INRA RNA seq DATABASE
  • miRNA mediated translation regulation in plants
  • rna pea galacy

Nucleic Acids ResearchThe 20th annual Database Issue of Nucleic Acids Research includes 176 articles, half of which describe new online molecular biology databases and the other half provide updates on the databases previously featured in NAR and other journals. This year’s highlights include two databases of DNA repeat elements; several databases of transcriptional factors and transcriptional factor-binding sites; databases on various aspects of protein structure and protein-protein interactions; databases for metagenomic and rRNA sequence analysis; and four databases specifically dedicated to Escherichia coli. The increased emphasis on using the genome data to improve human health is reflected in the development of the databases of genomic structural variation (NCBI’s dbVar and EBI’s DGVa), the NIH Genetic Testing Registry and several other databases centered on the genetic basis of human disease, potential drugs, their targets and the mechanisms of protein-ligand binding. Two new databases present genomic and RNAseq data for monkeys, providing wealth of data on our closest relatives for comparative genomics purposes. The NAR online Molecular Biology Database Collection has been updated and currently lists 1512 online databases.

The NAR online Molecular Biology Database Collection is available at http://www.oxfordjournals.org/nar/database/cap/.

The full content of the Database Issue is freely available online on the Nucleic Acids Research website: http://nar.oxfordjournals.org/.

  • Fernández-Suárez XM, Galperin MY. (2013) The 2013 Nucleic Acids Research Database Issue and the online molecular biology database collection. Nucleic Acids Res 41(Database issue):D1-7. [article]
Nucleotide Sequence Databases
RNA sequence databases
Protein sequence databases
Structure Databases
Genomics Databases (non-vertebrate)
Metabolic and Signaling Pathways
Human and other Vertebrate Genomes
Human Genes and Diseases
Microarray Data and other Gene Expression Databases
Proteomics Resources
Other Molecular Biology Databases
Organelle databases
Plant databases
Immunological databases
Cell biology

Incoming search terms:

  • RNA-seq pictures
  • RNAseq library illumina
  • fusionmap gsnap
  • nucleic acid research database
  • rna seq heat map
  • RNA seq meta pipeline
  • RNA seq pdf
  • rna-seq analysis differential express protocol
  • rna-seq and tophat versus cufflinks
  • Database issue of Nucleic Acids Research analysis

Biogenesis and molecular function are two key subjects in the field of microRNA (miRNA) research. Deep sequencing has become the principal technique in cataloging of miRNA repertoire and generating expression profiles in an unbiased manner.

miRGator

A team led by researchers at Ewha Womans University, Korea have updated miRGator to version v3.0. miRGator compiles the deep sequencing miRNA data available in public and the team has implemented several novel tools to facilitate exploration of massive data. The miR-seq browser supports users to examine short read alignment with the secondary structure and read count information available in concurrent windows. Features such as sequence editing, sorting, ordering, import and export of user data would be of great utility for studying iso-miRs, miRNA editing and modifications. miRNA-target relation is essential for understanding miRNA function. Coexpression analysis of miRNA and target mRNAs, based on miRNA-Seq and RNA-Seq data from the same sample, is visualized in the heat-map and network views where users can investigate the inverse correlation of gene expression and target relations, compiled from various databases of predicted and validated targets. By keeping datasets and analytic tools up-to-date, miRGator should continue to serve as an integrated resource for biogenesis and functional investigation of miRNAs.

Availability – miRGator v3.0 update is available at: http://mirgator.kobic.re.kr

Cho S, Jang I, Jun Y, Yoon S, Ko M, Kwon Y, Choi I, Jang H, Ryu D, Lee B, Kim VN, Kim W, Lee S. (2012) miRGator v3.0: a microRNA portal for deep sequencing, expression profiling and mRNA targeting. Nucleic Acids Res [Epub ahead of print]. [article]

Incoming search terms:

  • deep sequencing of serum microrna
  • RNA-Seq and microRNA expression profiling reveal networks of RNA interactions in regenerating dorsal root ganglion neurons
  • microRNA blog
  • mirgator v3 0
  • rna sequencing deep sequencing flow chart lc sciences
  • mirna-seq processing
  • mirgator trouble
  • microrna and ptt
  • Error running long_spanning_reads:Loading fusions
  • does rna-seq identify microrna

RhesusAlthough the rhesus macaque is a unique model for the translational study of human diseases, currently its use in biomedical research is still in its infant stage due to error-prone gene structures and limited annotations. Here, we present RhesusBase for the monkey research community (http://www.rhesusbase.org). Read more

Incoming search terms:

  • RHESUS MONKEY
  • easyrnaseq abundance
  • rhesus base li cy
  • rhesus macaques

What are the RNA-Seq models in Ensembl, and how were they determined? How does RNA-Seq data contribute to Ensembl gene sets? Can I upload my own RNA-Seq data to Ensembl? Answers to these questions and more…

PeaPea (Pisum sativum L.), with its high protein seeds and its ability to establish a symbiosis with soil nitrogen fixing bacteria, is a strategic crop in temperate regions. Moreover, pea is a long-standing model in genetics and physiology. This web-portal provides the first full-length Unigene set expression atlas for pea. Twenty pea cDNA libraries were prepared from different above- and below- ground cv “Cameor” plant organs, at different stages, and for different nutrition conditions. Libraries were sequenced using Next-Generation Sequencing technologies. Sequences were assembled de novo and a full-length Unigene set was produced. The sequencing depth of each cDNA contig relates to the expression level of transcripts. This gene atlas presents the pattern of expression and thus provides useful functional information for each cDNA contig. In the future, new RNA-Seq experiments will be added to this portal to enlarge the atlas’ scope.

The Pea RNA-Seq Gene Atlas is available at: http://bios.dijon.inra.fr/FATAL/cgi/pscam.cgi

Full-length de novo assembly of new pea RNA-seq data reveals the complexity of the pea transcriptome, S. Alves-Carvalho et al. in prep.

Incoming search terms:

  • GeneAtlas microarray blogs

from the miRBase Blog – By Sam

miRBase 19 is now available, brought to you from the Benasque RNA meeting in the sunny Pyrenees, and with a slightly larger time gap than usual. In that extended time, we have added more than the usual number of new sequences — 3171 new hairpins and 3625 novel mature products, bringing the totals to 21264 and 25141 respectively in 193 species. As always, the full README file is available on the FTP site, along with downloadable files containing all data in various formats. Read more

Incoming search terms:

  • mirbase statistics
  • how many human mrna sequence mirbase release 19
  • mirbase 19
  • mirbase gtf tophat
  • rat mirbase bed

Ensembl gene annotation provides a comprehensive catalogue of transcripts aligned to the reference sequence. It relies on publicly available species specific and orthologous transcripts plus their inferred protein sequence. The accuracy of gene models is improved by increasing the species specific component which can be cost-effectively achieved using RNA-Seq. Two zebrafish gene annotations are presented in Ensembl version 62 built on the Zv9 reference sequence.

Firstly, RNA-Seq data from five tissues and seven developmental stages were assembled into 25,748 gene models. A 3′ end capture and sequencing protocol was developed to predict the 3′ ends of transcripts and 46.1% of the original models were subsequently refined. Read more

Incoming search terms:

  • helicos
  • zebrafish rna-seq
  • rna seq zebrafish
  • Ensembl RNA-Seq gene model

mirfansmiRFANs, an online database for Arabidopsis thaliana miRNA function annotations. The creators integrated various type of datasets, including miRNA-target interactions, transcription factor (TF) and their targets, expression profiles, genomic annotations and pathways, into a comprehensive database, and developed various statistical and mining tools.

miRFANs consists of:

  1. Comprehensive collection of miRNA targets for Arabidopsis thaliana provides valuable information about the functions of plant miRNAs.
  2. Highly informative miRNA-mediated genetic regulatory network is extracted from our integrative database.
  3. Set of statistical and mining tools is equipped for analyzing and mining the database.
  4. User-friendly web interface is developed to facilitate the browsing and analysis of the collected data.

miRFANs is freely available at: http://www.cassava-genome.cn/mirfans

  • Liu H, Jin T, Liao R, Wan L, Xu B, Zhou S, Guan J. (2012) miRFANs: an integrated database for Arabidopsis thaliana microRNA function annotations. BMC Plant Biology [Epub ahead of print]. [abstract]

Incoming search terms:

  • miRFANs
  • mirna seq
  • mirfan
  • arabidopsis mirna database
  • mirna target arabidopsis
  • arabidopsis miRNA target
  • Mirfan com
  • functional annotation of mirnas
  • mir fan com
  • Plant MicroRNA Database

VESPA

VESPA is a desktop JavaTM application that integrates high-throughput proteomics data (peptide-centric) and transcriptomics (probe or RNA-Seq) data into a genomic context, all of which can be visualized at three levels of genomic resolution. Data is interrogated via searches linked to the genome visualizations to find regions with high likelihood of mis-annotation. Search results are linked to exports for further validation outside of VESPA or potential coding-regions can be analyzed concurrently with the software through interaction with BLAST.

VESPA is demonstrated on two use cases (Yersinia pestis Pestoides F and Synechococcus sp. PCC 7002) to demonstrate the rapid manner in which mis-annotations can be found and explored in VESPA using either proteomics data alone, or in combination with transcriptomic data.

The software is freely available at https://www.biopilot.org/docs/Software/Vespa.php

  • Peterson ES, McCue LA, Schrimpe-Rutledge AC, Jensen JL, Walker H, Kobold MA, Webb SR, Payne SH, Ansong CK, Adkins JN, Cannon WR, Webb-Robertson BJ. (2012) VESPA: software to facilitate genomic annotation of prokaryotic organisms through integration of proteomic and transcriptomic data. BMC Genomics 13(1), 131. [article]

Incoming search terms:

  • transcriptomics software vespa vs

Next Page →

  • Social Networking Pages

    Linkedin Group

  • Follow Me on Pinterest
  • RSS SEQanswers – RNA Sequencing

    • Identifying small RNA sequence within whole genome sequence May 21, 2013
      Hi all, I want to know if there are any useful bioinformatic tool to find small RNA sequence within a whole bacteria genome. Thank you in... […]
      Inma
    • standard of clean data May 21, 2013
      Hi all I recently got my prokaryotes RNA-seq data report back. the standard filter steps of the raw data set by our local sequencing center is as... […]
      Pengfei Liu
    • Problem with cummeRbund diffData() May 20, 2013
      Hi all, I'm running Tophat/cufflinks/cuffdiff for differential gene expression and analysis with cummeRbund (v 2.0.0). I'm having an issue with... […]
      Enrique Zudaire
    • How to increase rowsize in heatmap? May 16, 2013
      Hi, I am a complete newbie to all things cummeRbund and am currently fighting with generating readable heatmaps. When I use ... […]
      Mags
    • novoalign mapping May 15, 2013
      Hi, I want to use novoalign to map reads - allowing up to 15 mismatches for 100 bp paired-end reads I am new to novoalign(went through the... […]
      abh
    • Design of expt across multiple lanes May 15, 2013
      Hi, I am performing an RNA-seq experiment to look at differential expression. The design is as follows: 2 populations x 3 biological... […]
      jbono
  • RSS Biostar – RNA-Seq

    • What are the best practices for SNP identification in RNA seq transcriptome data
      I have 20 RICE RNA seq tranascriptome data hiseq 2000 platform paired end reads. I aligned fasta reads with BWA and remove PCR duplicates with PICARD. Later I call SNP with samtools using various parameters. I would like to clarify what parameters should I used while alinging to reference rice genome for looking SNP location 100 bp upstream and 250 bp downst […]
    • How do TopHat options -g , --supress-hits, and Bowtie options interplay?
      Hi, I am currently using TopHat2 to map RNA-seq runs. I think there have been some changes pertaining the -g option. Does anyone know how it works now? I used to think that setting -g would look for n alignments for a given read, report them [if top-scoring] and discard those reads that had more than g [top scoring] alignments. Now, the description sounds mo […]
    • What happened to -k in TopHat for multiple-mapping reads?
      Selecting -g n in tophat does not discard reads mapping more than n, but instead only reports n alignments for those out all all their TOP scoring alignments. I think there used to be an option -k that would allow one to discard reads that topped x alignments -- whatever happened to that? I only see -g in the tophat 2 manual, no reporting options like before […]
    • Does tophat use the library-type information for mapping, or just for the XS flag?
      When I specify library-type to TopHat, i.e., first-strand, second-strand, unstranded, TopHat appends a value + or - to the XS:A tag, which is useful for subsequent analyses, such as annotation. However, does this information influence the "mappability" of reads, or is this unaffected? My guess is that the information will be considered for mapping […]
    • Purpose of Y-shaped adapters in Illumina Sequencing?
      Hi all, Y adapters different sequences to be annealed to the 5' and 3' ends of each molecule in a library. The arms of the Y are unique, and the middle part, connected to the DNA fragment, is complementary. What are the advantages of this? My take of this over having fully-complementary adapters (ADAPTER1 - - - - - ADAPTER1) is that: -Upon primer a […]
    • Cell Type composition in a tissue based on gene marker expression
      I am not sure if the following would even make sense.... Tissues are composed of composite cell types, and often there are studies such as microarray/NGS where we perform a collective sampling of cells from these tissues. Information about the composition (say percentage of cell type) is not taken into consideration. In some case (such as brain/cancer), ther […]