Previous studies compared running cost, time and other performance measures of popular sequencing platforms. However, comprehensive assessment of library construction and analysis protocols for Proton sequencing platform remains unexplored. Unlike Illumina sequencing platforms, Proton reads are heterogeneous in length and quality. When sequencing data from different platforms are combined, this can result in reads with various read length. Whether the performance of the commonly used software for handling such kind of data is satisfactory is unknown.
- By using universal human reference RNA as the initial material, RNaseIII and chemical fragmentation methods in library construction showed similar result in gene and junction discovery number and expression level estimated accuracy.
- In contrast, sequencing quality, read length and the choice of software affected mapping rate to a much larger extent.
- Unspliced aligner TMAP attained the highest mapping rate (97.27 % to genome, 86.46 % to transcriptome), though 47.83 % of mapped reads were clipped.
- Long reads could paradoxically reduce mapping in junctions.
- With reference annotation guide, the mapping rate of TopHat2 significantly increased from 75.79 to 92.09 %, especially for long (>150 bp) reads.
- Sailfish, a k-mer based gene expression quantifier attained highly consistent results with that of TaqMan array and highest sensitivity.
Comparison of library preparation methods
a consistency of gene expression quantification. RNaseIII (ProR) and chemical fragmentation (ProC) methods showed high consistency in gene expression quantification. b–c Genes and junctions detected. Large proportions of genes and junctions were commonly detected by all the three libraries. d–e Comparison of gene and junction expression level by ProC and ProR fragmentation method. The number of reads mapping onto gene and junction regions are plotted as dots. d 949 genes were only detected by ProC but not by ProR, while 662 genes were only detected by ProR. e 35841 junctions were only detected by ProC but not by ProR, while 32394 junctions were only detected by ProR, f coverage of a house-keeping gene beta actin (ACTB). Similar coverage patterns were observed between technical replicates, but not between library preparation protocols. g Comparison in consistency between ProC and ProR fragmentation methods by base coverage, the percentage of reads mapped to various gene sequence categories according to GENCODE v24 h Comprehensive and i Basic gene annotation
Researchers at BGI-tech provide for the first time, the reference statistics of library preparation methods, gene detection and quantification and junction discovery for RNA-Seq by the Ion Proton platform. They have evaluated the optimal Ion Proton sequencing options and analysis software.