Short paired-end reads trump long single-end reads for expression analysis

Typical experimental design advice for expression analyses using RNA-seq generally assumes that single-end reads provide robust gene-level expression estimates in a cost-effective manner, and that the additional benefits obtained from paired-end sequencing are not worth the additional cost. However, in many cases (e.g., with Illumina NextSeq and NovaSeq instruments), shorter paired-end reads and longer single-end reads can be generated for the same cost, and it is not obvious which strategy should be preferred. Using publicly available data, Harvard University researchers test whether short-paired end reads can achieve more robust expression estimates and differential expression results than single-end reads of approximately the same total number of sequenced bases.

At both the transcript and gene levels, 2 × 40 paired-end reads unequivocally provide expression estimates that are more highly correlated with 2 × 125 than 1 × 75 reads; in nearly all cases, those correlations are also greater than for 1 × 125, despite the greater total number of sequenced bases for the latter. Across an array of metrics, differential expression tests based upon 2 × 40 consistently outperform those using 1 × 75.

Spearman’s rank correlations for kallisto-derived transcripts per million (TPM) between the gold standard paired-end 2 × 125 strategy and alternative strategies

figure1

Violin plots of (a) transcript and (b) gene-level inference. Comparison of correlations with 2 × 125 between 2 × 40 and 1 × 75 for (c) transcript and (d) genes, and between 2 × 40 and 1 × 125 for (e) transcript and (f) genes. For cf, symbol colors correspond to SRA accessions, and points above the red dotted line are samples where estimates of expression from 2 × 40 is more highly correlated with the gold standard than the contrasted single-end strategy

Researchers seeking a cost-effective approach for gene-level expression analysis should prefer short paired-end reads over a longer single-end strategy. Short paired-end reads will also give reasonably robust expression estimates and differential expression results at the isoform level.

Adam H Freedman AH, John M Gaspar JM, Timothy B Sackton TB. (2020) Short Paired-End Reads Trump Long Single-End Reads for Expression Analysis. BMC Bioinformatics 21(1):149. [article]

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Time limit is exhausted. Please reload CAPTCHA.