Introduction to RNA-Seq and its Applications to Drug Discovery and Development

The research and development (R&D) process in modern drug discovery and development is a challenging task. One of the reasons is because the biological system is complex. There are 20,000–25,000 genes in the human genome. Using traditional test methods such as polymerase chain reaction (PCR) to measure the expression levels of these genes or identify gene isoforms is costly and time consuming.

RNA sequencing (RNA-Seq) is a high-throughput technology that was newly developed in 2008 for comprehensive transcriptome study [Wang et al., 2009]. It can measure the expression patterns of thousands of genes simulta neously and provide insight into functional pathways and regulation in biological processes. As shown in Table 1, RNA-Seq has many advantages over expression microarray, another powerful tool for transcriptome analysis.

Firstly, expression microarray is based on the hybridization between gene probes on microarray and target genes in biological samples. These probes are designed based on existing genome annotation. Unlike microarray, RNA-Seq is not limited to detecting known transcripts, which makes RNA-Seq more attractive for the discovery of novel gene transcripts and noncoding RNAs [Wang et al., 2009].

Secondly, RNA-Seq is considerably superior in resolution. RNA-Seq can reveal the fine structure of the transcriptome with a single nucleotide resolution, which can help identify allele-specific expression, alternative splicing, and single nucleotide polymorphisms (SNPs) in the transcribed regions [Quinn et al., 2013].

Thirdly, RNA-Seq has a broader dynamic range of expression levels than expression microarray. It enables the detection of more differentially expressed genes.


Table 1. The Comparison of RNA-Seq and Expression Microarray
RNA-Seq Expression microarray
Detection methods High-throughput sequencing DNA hybridization
Reference genome required Not necessary Necessary
Signal Counts of reads Relative intensities
Resolution Single nucleotide Probe length
High throughput Yes Yes
High reproducible Yes Yes
Novel gene and isoform detection Yes No
SNP detection in the transcribed regions Yes No
Dynamic range of expression levels High Low


There are four commercial next-generation sequencing (NGS) platforms available for RNA-Seq: Illumina, SOLID, Ion Torrent, and Roche 454. Table 2 compares the output, run time and error rate of these different platforms. As the sequencing cost continues to fall, RNA-Seq is expected to replace expression microarray as the main approach for transcriptome study. In this overview, the authors review the statistical analysis of RNA-Seq data and some progress of the applications of RNA-Seq in drug discovery and development.


Table 2. Popular Next-Generation Sequencing Platforms Currently Available for RNA-Seq
Platform Mechanism Read length (bp) Throughput/Run (GB) Run time Error rate (%) Primary errors
Illumina HiSeq 2500 Reversible termination 125 1000 6 days 0.26 Substitution
ABI/LifeTechnology-SOLID 5550 XL Ligation 120 15 8 days 0.1 A-T bias
ION Torrent 318 H+ ion sensitive transistor 200 1 2 h 1.71 Insertion/Deletion
Roche 454 Pyrosequencing 400 0.5 10 h 0.8 Insertion/Deletion


Khatoon Z, Figler B, Zhang H, Cheng F. (2014) Introduction to RNA-Seq and its Applications to Drug Discovery and Development. Drug Dev Res 75(5):324-30. [abstract]