The research and development (R&D) process in modern drug discovery and development is a challenging task. One of the reasons is because the biological system is complex. There are 20,000–25,000 genes in the human genome. Using traditional test methods such as polymerase chain reaction (PCR) to measure the expression levels of these genes or identify gene isoforms is costly and time consuming.
RNA sequencing (RNA-Seq) is a high-throughput technology that was newly developed in 2008 for comprehensive transcriptome study [Wang et al., 2009]. It can measure the expression patterns of thousands of genes simulta neously and provide insight into functional pathways and regulation in biological processes. As shown in Table 1, RNA-Seq has many advantages over expression microarray, another powerful tool for transcriptome analysis.
Firstly, expression microarray is based on the hybridization between gene probes on microarray and target genes in biological samples. These probes are designed based on existing genome annotation. Unlike microarray, RNA-Seq is not limited to detecting known transcripts, which makes RNA-Seq more attractive for the discovery of novel gene transcripts and noncoding RNAs [Wang et al., 2009].
Secondly, RNA-Seq is considerably superior in resolution. RNA-Seq can reveal the fine structure of the transcriptome with a single nucleotide resolution, which can help identify allele-specific expression, alternative splicing, and single nucleotide polymorphisms (SNPs) in the transcribed regions [Quinn et al., 2013].
Thirdly, RNA-Seq has a broader dynamic range of expression levels than expression microarray. It enables the detection of more differentially expressed genes.
RNA-Seq | Expression microarray | |
---|---|---|
Detection methods | High-throughput sequencing | DNA hybridization |
Reference genome required | Not necessary | Necessary |
Signal | Counts of reads | Relative intensities |
Resolution | Single nucleotide | Probe length |
High throughput | Yes | Yes |
High reproducible | Yes | Yes |
Novel gene and isoform detection | Yes | No |
SNP detection in the transcribed regions | Yes | No |
Dynamic range of expression levels | High | Low |
There are four commercial next-generation sequencing (NGS) platforms available for RNA-Seq: Illumina, SOLID, Ion Torrent, and Roche 454. Table 2 compares the output, run time and error rate of these different platforms. As the sequencing cost continues to fall, RNA-Seq is expected to replace expression microarray as the main approach for transcriptome study. In this overview, the authors review the statistical analysis of RNA-Seq data and some progress of the applications of RNA-Seq in drug discovery and development.
Platform | Mechanism | Read length (bp) | Throughput/Run (GB) | Run time | Error rate (%) | Primary errors |
---|---|---|---|---|---|---|
Illumina HiSeq 2500 | Reversible termination | 125 | 1000 | 6 days | 0.26 | Substitution |
ABI/LifeTechnology-SOLID 5550 XL | Ligation | 120 | 15 | 8 days | 0.1 | A-T bias |
ION Torrent 318 | H+ ion sensitive transistor | 200 | 1 | 2 h | 1.71 | Insertion/Deletion |
Roche 454 | Pyrosequencing | 400 | 0.5 | 10 h | 0.8 | Insertion/Deletion |