Survey Results – RNA-Seq analysis for Differential Gene/Transcript Expression

posted by bodhisattvax at Biostar

Hi all I’ve finally put together the results of the survey! First of all, thanks to everyone who participated – the response has been great, with 93 people completing the survey as of today.

The respondents have been a varied bunch, including all levels of academia (pre-docs, grad-students, pot-docs and PIs), core bioinformaticians and bioinformatics managers, as well as many from the industry. The majority of respondents appear to be based in the US and Europe but also in China, Korea and Australia.

I provide below my own summary of the survey’s findings, and I have a document which contains all the results, including all unedited comments. I’m not sure how I can upload this file on this site. If you would like it, please either check my post on seqanswers where I have been able to upload the file, or get in touch with me so I can email it to you. Biostars admins can you help here?

As with any survey, we should probably be aware of potential biases (e.g. skews caused by people who are really annoyed with a particular tool!). My inferences below are probably influenced by my own experiences, so feel free to rap my knuckles if you feel I am over-reaching my inferences or misinterpreted the data, and to air your doubts about the veracity and accuracy of the results and conclusions. I’d also like to declare here that I have no vested interests, have nothing to gain by promoting one tool over another, and have only used a small number of all the tools listed.

Now for the summary. Enjoy!

One of the take-home messages from the survey appears to be that the shadow of the Tuxedo Suite still looms large over the RNA-Seq analysis field. However there is a wide diversity of opinions and experiences, and many other tools appear to be in the ascendancy, especially when it comes to read-counting and differential expression analysis.

Q1. What do you prefer to align your reads to?

Most respondents align to the genome only (47.3%) , and this is closely followed by those who align to both genome and transcriptome (39.8%). Key to their choices has been the availability and reliability of data, as well as the question being asked in the experiment. Respondents who chose to align to the genome only appear to do so for various reasons such as the ability to discover new transcripts and splice variants. However many respondents have commented that aligning to both the genome and transcriptome offers several advantages, such as increased speed and accuracy. Thus , for a species, if both a reliable genome and transcriptome are available, this might be the optimal way forward.

Q2 and 3. What is your preferred aligner? And the reasons why.

Tophat rules the roost here, taking more than two-thirds of the vote (67.9%). Reasons for this include its ease of use, proven accuracy (which has improved over time), historical popularity, and that the alternatives available have not yet warranted a change from Tophat. Another Tuxedo suite aligner, Bowtie, comes in at a distant second (17.3%). STAR (6.2%) has been noted for its speed.

Q4 and 5. What is your preferred read-counting methodology? And the reasons why.

Again, a Tuxedo suite tool, Cufflinks, took the majority of votes (57.1%). Reasons for this included its ease of use but many respondents appear to use this because it has been logical follow-on from using Tophat as per the Tuxedo workflow. The second-placed HTSeq-count appears to be in the ascendancy – many respondents appear to have been dissatisfied with Cufflinks and switched to HTSeq-count. This looks to be a good candidate to topple Cufflinks from the top in the near future. Other notable tools include easyRNASeq and RSEM. Also, many respondents use bedtools, samtools or in-house tools and custom scripts.

Q6 and 7. What is your preferred methodology to estimate differential expression? And the reasons why.

Finally, a non-Tuxedo suite tool wins the vote: DESeq/DEXSeq with 44.7%. CuffDiff is not too far behind (35.5%) and EdgeR (19.7%) brings up the rear. Going by the comments , we might expect usage of DESeq and EdgeR to increase as opposed to CuffDiff. Results from the latter have been variously described as weird, untrustworthy, having too many false positives and other problems.

Q8. Which annotation resource do you use?

Ensembl (46.6%) was the clear winner. Second and third places were closely contested between Refseq (25.9%) and UCSC(22.4%) respectively.

Q9. What software do you use for downstream analyses?

GOSeq (68.9%) is clearly very widely used. Many respondents also use the commercial options of Ingenuity IPA and Genego Metacore. DAVID was also an honourable mention.

P.S. Please note: the percentages quoted relate to the numbers of people who answered that particular question. This varies widely across questions, from all 93 respondents in the first question, to 45 for Q9

(read more…)