High-throughput RNA sequencing (RNA-seq) provides a comprehensive picture of the transcriptome, including the identity, structure, quantity, and variability of expressed transcripts in cells, through the assembly of sequenced short RNA-seq reads. Although the reference-based approach guarantees the high quality of the resulting transcriptome, this approach is only applicable when the relevant reference genome is present.
Researchers from Hanyang University, Korea developed a pseudo-reference-based assembly (PRA) that reconstructs a transcriptome based on a linear regression function of the optimized mapping parameters and genetic distances of the closest species. Using the linear model, they reconstructed transcriptomes of four different aves, the white leg horn, turkey, duck, and zebra finch, with the Gallus gallus genome as a pseudo-reference, and of three primates, the chimpanzee, gorilla, and macaque, with the human genome as a pseudo-reference. The resulting transcriptomes show that the PRAs outperformed the de novo approach for species with within about 10% mutation rate among orthologous transcriptomes, enough to cover distantly related species as far as chicken and duck. Taken together, the researchers suggest that the PRA method can be used as a tool for reconstructing transcriptome maps of vertebrates whose genomes have not yet been sequenced.
Decision flow system of transcriptome assembly (de novo, reference-based, and PRA).
Given the RNA-seq from a certain species, if a reference is available, reference-based assembly is recommended. Otherwise, it asks whether the genome sequence of a closely related species is available within a certain genetic distance p. If available, the PRA is recommended. Otherwise, the de novo assembly is recommended.