The majority of the human genome is differentially expressed across a wide dynamic range to produce a spectrum of protein-coding and noncoding RNAs, generating a transcriptome of unexpected scale and complexity. These features present a challenge for gene-expression profiling with qRT-PCR and RNA-seq. qRT-PCR is ill suited for discovering genes, resolving alternative splicing or characterizing whole transcriptomes. RNA-seq measures global gene abundance and splicing but provides scant coverage of rare transcripts, which limits the sensitivity of transcript assembly and quantification. Researchers from the Garvan Institute of Medical Research recently developed CaptureSeq, which enriches transcripts of interest by hybridizing them to magnetic bead-linked oligonucleotides that are tiled across the region of interest, allowing for targeted purification, multiplexed library preparation and RNA sequencing at a high depth.
To assess CaptureSeq’s quantitative accuracy, the researchers undertook an analysis of transcript quantification in direct comparison to conventional RNA-seq and qRT-PCR. As an independent reference, External RNA Control Consortium (ERCC) RNA standards spanning an ~106-fold range of concentrations were spiked into three biological-replicate human K562 RNA samples. They then performed CaptureSeq on each sample, targeting the full range of standards, along with matched RNA-seq.
CaptureSeq exhibited high correlation between biological and technical replicates (mean Spearman’s r ≥ 0.998 in measured ERCC standard abundance; similar to matched RNA-seq (Spearman’s r = 0.991). CaptureSeq was superior for the detection and quantification of genes with low expression, showed little technical variation and accurately measured differential expression. By clustering lncRNA expression levels across human tissues, the researchers also identified coexpressed subsets of lncRNAs, which provided a greatly expanded atlas of lncRNA expression.
Profiling of human long noncoding RNAs with CaptureSeq
(a) Schematic diagram of the CaptureSeq process for transcriptional profiling. (b) Frequency distribution of total RNAs (black), coding mRNAs (blue) and lncRNAs (red) in K562 cells. (c) Three original lncRNA annotations (blue) were merged into larger spliced transcripts after targeted sequencing and enhanced coverage provided by CaptureSeq. Arrows indicate the direction of transcription. Chr, chromosome; TSS, transcriptional start site. (d) Enhanced coverage provided by targeted sequencing enabled two lncRNAs and three coding genes to merge into a single locus (CFAP47) with a large predicted ORF. 1, CFAP475′ (previously CXorf22); 2, lncRNA; 3, CFAP47 internal (previously CHDC2); 4, lncRNA; 5, CFAP473′ (previously CXorf30). Also indicated are TSSs and enhancer positions from FANTOM 5, H3K4me3 sites from the Roadmap Epigenome H3K4me3 summary for 121 sample types, and RT-PCR primers used to confirm assembly. aa, amino acid.