SNP calling from RNA-seq data without a reference genome

SNPs (Single Nucleotide Polymorphisms) are genetic markers whose precise identification is a prerequisite for association studies. Methods to identify them are currently well developed for model species, but rely on the availability of a (good) reference genome, and therefore cannot be applied to non-model species. They are also mostly tailored for whole genome (re-)sequencing experiments, whereas in many cases, transcriptome sequencing can be used as a cheaper alternative which already enables to identify SNPs located in transcribed regions.

Researchers at the University of Lyon have developed a method that identifies, quantifies and annotates SNPs without any reference genome, using RNA-seq data only. Individuals can be pooled prior to sequencing, if not enough material is available from one individual. Using pooled human RNA-seq data, they clarify the precision and recall of their method and discuss them with respect to other methods which use a reference genome or an assembled transcriptome. The researchers then validate experimentally the predictions of their method using RNA-seq data from two non-model species. The method can be used for any species to annotate SNPs and predict their impact on the protein sequence. They further enable to test for the association of the identified SNPs with a phenotype of interest.


With fasta/fastq input from an RNA-seq experiment, SNPs are found by KisSplice without using a reference. As KisSplice provides only a local context around the SNPs, a reference can be built with Trinity, and SNPs can be positioned on whole transcripts. Some SNPs that do not map on the transcripts of Trinity, called orphan SNPs, are harder to study but can still be of interest. We propose a statistical method, called kissDE, to find condition-specific SNPs (even if they are not positioned) out of all SNPs found. Finally, we can also predict the amino acid change for the positioned SNPs, and intersect these results with condition-specific SNPs using our package KisSplice2RefTranscriptome (K2rt).

Availability – All the methods presented in this paper are implemented in software that are freely available at

Lopez-Maestre H, Brinza L, Marchet C, Kielbassa J, Bastien S, Boutigny M, Monnin D, Filali AE, Carareto CM, Vieira C, Picard F, Kremer N, Vavre F, Sagot MF, Lacroix V. (2016) SNP calling from RNA-seq data without a reference genome: identification, quantification, differential analysis and impact on the protein sequence. Nucleic Acids Res [Epub ahead of print] [article]

Leave a Reply

Your email address will not be published. Required fields are marked *


Time limit is exhausted. Please reload CAPTCHA.