How Many Reads are Enough?

ThinkingThere is no question that RNA-Seq has several major advantages over current hybridization-based approach such as microarrays. However, with the cost per sample of RNA-Seq still much higher than microarray, it would be beneficial if multiple samples could be multiplexed and sequenced in a single lane with. But how many reads are enough for sufficient transcriptome coverage?

You may remember we posted the ENCODE consortium’s Standards, Guidelines and Best Practices for RNA-Seq back in July which recommends:

“Experiments whose purpose is to evaluate the similarity between the transcriptional profiles of two polyA+ samples may require only modest depths of sequencing (e.g. 30M pair-end reads of length > 30NT, of which 20-25M are mappable to the genome or known transcriptome.”

and for

“Experiments whose purpose is discovery of novel transcribed elements and strong quantification of known transcript isoforms… a minimum depth of 100-200 M 2 x 76 bp or longer reads is currently recommended.”

As additional studies are conducted, it will become clearer how much multiplexing is appropriate for a given experimental goal. A new study by researchers at Texas A&M University evaluated what sequencing depth might be sufficient to interrogate gene expression profiling in the chicken by RNA-Seq.

Two cDNA libraries from chicken lungs were sequenced. Totals of 29.6 M and 28.7 M (75 bp) reads were obtained with the two samples. More than 90% of annotated genes were detected in the data sets with 28.7-29.6 M reads, while only 68% of genes were detected in the data set with 1.6 M reads. The correlation coefficients of gene expression between technical replicates within the same sample were 0.9458 and 0.8442.

To evaluate the appropriate depth needed for mRNA profiling, a random sampling method was used to generate different number of reads from each sample. There was a significant increase in correlation coefficients from a sequencing depth of 1.6 M to 10 M for all genes except highly abundant genes. No significant improvement was observed from the depth of 10 M to 20 M (75 bp) reads.

The analysis from the current study demonstrated that 30 M (75 bp) reads is sufficient to detect all annotated genes in chicken lungs. Ten million (75 bp) reads could detect about 80% of annotated chicken genes. Furthermore, the depth of sequencing had a significant impact on measuring gene expression of low abundant genes.

  • Wang Y, Ghaffari N, Johnson CD, Braga-Neto UM, Wang H, Chen R, Zhou H. (2011) Evaluation of the coverage and depth of transcriptome by RNA-Seq in chickens. BMC Bioinformatics Proceedings of the Eighth Annual MCBIOS Conference. Computational Biology and Bioinformatics for a New Decade, College Station, TX, USA. 1-2 April 2011. [article]