RNA-Seq reveals gene expression and alternative splicing information

RNA-sequencing (RNA-seq) allows quantitative measurement of expression levels of genes and their transcripts.

In this study, researchers performed RNA-Seq on cultured human B-cells and obtained 879 million 50-bp reads comprising 44 Gb of sequence.

They identified 20,766 genes and 67,453 of their alternatively spliced transcripts.

Key findings:

  1. More than 90% of the genes with multiple exons are alternatively spliced.
  2. For most genes, one isoform is predominantly expressed.
  3. While chromosomes differ in gene density, the percentage of transcribed genes in each chromosome is less variable.
  4. Genes involved in related biological processes are expressed at more similar levels than genes with different functions.

They also used the data to investigate the effect of sequencing depth on gene expression measurements.

  • While 100 million reads are sufficient to detect most expressed genes and transcripts
  • About 500 million reads are needed to measure accurately their expression levels

They provide examples in which deep sequencing is needed to determine the relative abundance of genes and their isoforms. With data from 20 individuals and about 40 million sequence reads per sample, they uncovered only 21 alternatively spliced, multi-exon genes that are not in databases; this result suggests that at this sequence coverage, one can detect most of the known genes. Results from this project are available on the UCSC Genome Browser.

  • Toung JM, Morley M, Li M, Cheung VG. (2011) RNA-sequence analysis of human B-cells. Genome Res [Epub ahead of print]. [abstract]