Commonly used RNA-seq alignment and variant calling programs perform poorly in detecting intermediate long indels (>2 bases) that are clinically actionable

Driver somatic mutations are a hallmark of a tumor that can be used for diagnosis and targeted therapy. Mutations are primarily detected from tumor DNA. As dynamic molecules of gene activities, transcriptome profiling by RNA sequence (RNA-seq) is becoming increasingly popular, which not only measures gene expression but also structural variations such as mutations and fusion transcripts. Although single-nucleotide variants (SNVs) can be easily identified from RNA-seq, intermediate long insertions/deletions (indels  > 2 bases and less than sequence reads) cause significant challenges and are ignored by most RNA-seq analysis tools. Researchers from the Mayo Clinic Rochester evaluated commonly used RNA-seq analysis programs along with variant and somatic mutation callers in a series of data sets with simulated and known indels. The aim is to develop strategies for accurate indel detection. These results show that the RNA-seq alignment is the most important step for indel identification and the evaluated programs have a wide range of sensitivity to map sequence reads with indels, from not at all to decently sensitive. The sensitivity is impacted by sequence read lengths. Most variant calling programs rely on hard evidence indels marked in the alignment and the programs with realignment may use soft-clipped reads for indel inferencing. Based on the observations, the researchers have provided practical recommendations for indel detection when different RNA-seq aligners are used and demonstrated the best option with highly reliable results. With careful customization of bioinformatics algorithms, RNA-seq can be reliably used for both SNV and indel mutation detection that can be used for clinical decision-making.

Performance of aligners and variant callers in the simulated RNA-seq data

rna-seq

Panel A is for all simulated 1805 indels with length from 1 to 9. Panel B is for the indels >2 bases (3–9 bases). GSNAP and STAR are the better aligners and HaplotypeCaller and BCFtools are the better choices for indel calling.

Sun Z, Bhagwate A, Prodduturi N, Yang P, Kocher JA. (2016) Indel detection from RNA-seq data: tool evaluation and strategies for accurate detection of actionable mutations. Brief Bioinform [Epub ahead of print]. [article]

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Time limit is exhausted. Please reload CAPTCHA.