ArrayExpress – a database of functional genomics experiments including gene expression

The ArrayExpress Archive is a database of functional genomics experiments including gene expression where you can query and download data collected to MIAME and MINSEQE standards. Gene Expression Atlas contains a subset of curated and re-annotated Archive data which can be queried for individual gene expression under different biological conditions across experiments.

ArrayExpress is available online at:

To submit your RNA-Seq data to the EMBL-EBI database, go to:

ENCODE Caltech RNA-seq

This track is produced as part of the ENCODE Project. RNA-Seq is a method for mapping and quantifying the transcriptome of any organism that has a genomic DNA sequence assembly. RNA-Seq is performed by reverse-transcribing an RNA sample into cDNA, followed by high throughput DNA sequencing. The resulting sequence reads are then informatically mapped onto the genome sequence.

The transcriptome measurements shown on these tracks were performed on polyA selected RNA from total cellular RNA. Data have been produced in two formats: single reads, each of which comes from one end of a randomly primed cDNA molecule; and paired-end reads, which are obtained as pairs from both ends cDNAs resulting from random priming. The resulting sequence reads are then informatically mapped onto the genome sequence (Alignments). Those that don’t map to the genome are mapped to known RNA splice junctions (Splice Sites). These mapped reads are then counted to determine their frequency of occurrence at known gene models. Sequence reads that cluster at genome locations that lack an existing transcript model are also identified informatically and they are quantified. RNA-Seq is especially suited for giving information about RNA splicing patterns and for determining unequivocally the presence or absence of lower abundance class RNAs. As performed here, internal RNA standards are used to assist in quantification and to provide internal process controls. This RNA-Seq protocol does not specify the coding strand. As a result, there will be ambiguity at loci where both strands are transcribed. The “randomly primed” reverse transcription is, apparently, not fully random. This is inferred from a sequence bias in the first residues of the read population, and this likely contributes to observed unevenness in sequence coverage across transcripts. These tracks show 1×32 n.t. or 2×75 n.t. sequence reads of cDNA obtained from biological replicate samples (different culture plates) of the ENCODE cell lines. The 32 n.t. sequences were aligned to the human genome (hg18) and UCSC known-gene splice junctions using different sequence alignment programs. The 2×75 n.t. reads were mapped serially, first with the Bowtie program (Langmead et al., 2009) against the genome and UCSC known-gene splice junctions (Splice Sites). Bowtie-unmapped reads were then mapped using BLAT to find evidence of novel splicing, by requiring at least 10 bp on the short-side of the splice.