Determining Standards for RNA-Seq Data Analysis: Biological Interpretation

Stuart Tugendreich Ph.D.
Director, Product Management New Solutions, Ingenuity Systems

At last fall’s Next Generation Sequencing Congress, Keith Batchelder, MD and CEO of Genomic Healthcare Strategies, gave a talk and argued a key point: that the deluge of raw data from NGS technologies has no value unless it is analyzed, annotated, and associated with other data. In other words – the interpretation of the data has more value than the data itself.  A recent special report by Bio-IT World discusses this same problem:

“There is a growing gap between the generation of massively parallel sequencing output and the ability to process and analyze the resulting data,” says Canadian cancer research John McPherson, feeling the pain of NGS neophytes left to negotiate “a bewildering maze of base calling, alignment, assembly, and analysis tools with often incomplete documentation and no idea how to compare and validate their outputs.”

Researchers and laboratories willingly spend thousands of dollars on instrumentation to produce data, but that investment is lost or misguided if the analysis is lacking or haphazard.  But what makes for good analysis of NGS data?  Specifically for RNA-Seq data, there aren’t well-established methods of data quality control, processing, or analysis approaches.  So, how can you overcome this challenge for replicable and reliable results that maximize the value of your experimental investment? And how can you sort through huge amounts of data to quickly make sense of your data?

In the struggle to determine these standards around RNA-Seq data analysis, biological interpretation is emerging as the key way to quickly narrow in on relevant information and examine data within a consistent set of biological references.  Examining the results from an RNA-Seq dataset in the context of established and known molecular relationships, biological processes, and relationships to disease,  provides a faster and more reliable, replicable way to identify key insights from complex data.

Also crucial is the tight integration of a data analysis tool with the relevant content sources, to ensure data integrity and the seamless, accurate transition from data processing to biological interpretation. Tools that will emerge as likely standard setters for RNA-Seq analysis are those like Ingenuity’s IPA, which  1) can accept the upload of relevant identifiers like RNA-Seq, Ensembl, and UCSC  from upstream partners like CLC bio, Geospiza, GenomeQuest, and Partek, and 2) already has a comprehensive biological knowledge base that can put this data in the context known biological relationships and processes.

This sentiment was echoed by Batchelder, who touched on this concept when he suggested that the ability to add value to data, and the ability to aggregate, mine, and curate data in a way that connects it to other information was most crucial for success in the NGS field.

Learn more about how to maximize your investment in RNA-Seq data at the Next-Gen Sequencing Congress in Boston in April.  Dr. Sandeep Sanga, Bioinformatics Product Development Scientist at Ingenuity Systems, will be speaking on Wednesday, April 27, 11:45 am:  Insights into Prostate Cancer Mechanisms via Integrated In Silico RNA-Seq Analysis of NGS Patient Data. You can preview the poster here.