The single molecule, real time (SMRT) sequencing technology of Pacific Biosciences enables the acquisition of transcripts from end to end due to its ability to produce extraordinarily long reads (>10 kb). This new method of transcriptome sequencing has been applied to several projects on humans and model organisms. However, the raw data from SMRT sequencing are of relatively low quality, with a random error rate of approximately 15 %, for which error correction using next-generation sequencing (NGS) short reads is typically necessary. Few tools have been designed that apply a hybrid sequencing approach that combines NGS and SMRT data, and the most popular existing tool for error correction, LSC, has computing resource requirements that are too intensive for most laboratory and research groups. These shortcomings severely limit the application of SMRT long reads for transcriptome analysis.
Here, researchers from the Beijing Key Laboratory of Innovative Drug Discovery report an improved tool (LSCplus) for error correction with the LSC program as a reference. LSCplus overcomes the disadvantage of LSC’s time consumption and improves quality. Only 1/3-1/4 of the time and 1/20-1/25 of the error correction time is required using LSCplus compared with that required for using LSC.
Two local details of the alignments of raw LRs and LSCplus-corrected LRs
(a) Two positions of isoform identification. (b) Two positions of isoform identification and Two positions of exons recovery. Error correction of RNA-seq data provides more accurate mapping of transcripts. A genome browser view of transcriptome alignments using uncorrected (blue) and corrected (green) PacBio reads. Color blocks represent the exons. Before correction, only one potential transcript isoform was detected with any exons missing (indicated with purple rectangles), and after correction, the corrected sequences matched the reference annotations end to end with no exons missing. As a result, the isoforms (indicated with red rectangles) at the displayed reference locus in the reference annotation were confirmed by corrected PacBio RNA-seq reads
Availability – LSCplus is freely available at http://www.herbbol.org:8001/lscplus/