Mass spectrometry-based methods allow for the direct, comprehensive analysis of expressed proteins and their quantification among different conditions. However, in general identification of proteins by assigning experimental mass spectra to peptide sequences of proteins relies on matching mass spectra to theoretical spectra derived from genomic databases of organisms. This conventional approach limits the applicability of proteomic methodologies to species for which a genome reference sequence is available.
Recently, RNA-sequencing (RNA-Seq) became a valuable tool to overcome this limitation by de novo construction of databases for organisms for which no DNA sequence is available, or by refining existing genomic databases with transcriptomic data. Here researchers from the Max Planck Institute for Molecular Genetics present a generic pipeline to make use of transcriptomic data for proteomics experiments. They show in particular how to efficiently fuel proteomic analysis workflows with sample-specific RNA-sequencing databases. This approach is useful for the proteomic analysis of so far unsequenced organisms, complex microbial metatranscriptomes/metaproteomes (for example in the human body), and for refining current proteomics data analysis that solely relies on the genomic sequence and predicted gene expression but not on validated gene products. Finally, the approach used in the here presented protocol can help to improve the data quality of conventional proteomics experiments that can be influenced by genetic variation or splicing events.
Scheme of the bioinformatics workflow for the integrative analysis of shotgun mass spectrometry and RNA-Seq data. Proteins are subjected to LC-MS/MS analysis whereas RNA, ideally isolated from the same samples used for proteomics, is sequenced on next-generation sequencing (NGS) instruments. Short read sequencing data is used to reconstruct protein sequences used as sequence database in the peptide search engine for identification of peptides and proteins from mass spectrometric raw data. Finally, identified proteins can be de novo annotated based on sequence homology search to known proteins.