Tandem mass spectrometry (MS/MS) has been used to improve genome annotation in various organisms. The classical approach is to construct comprehensive theoretical peptide database with six frame translation model from the whole open reading frame (ORF) of a genome, and search against this database with real MS/MS spectra.
In this work researchers from the Shanghai Academy of Science and Technology took a more focused approach, they constructed a database containing only peptides from the ab initio predicted genes from current human genome annotation, and all theoretical peptides from currently annotated lncRNAs, and searched such a database with MS/MS data from human Hela cell line. The purpose of this design is to find translation evidence for ab initio predicted genes, and to rule out possible wrongly defined lncRNAs in a systematic proteogenomics effort.
To validate proteogenomics results, the researchers integrated RNA-Seq data analysis for the same Hela cell line which generated MS/MS data, and performed MRM experiment on self-cultured Hela cell line samples. Six peptides were found to support ab initio predicted genes with both RNA-Seq and MRM validations, while none was found to support a translated lncRNA. This flexible workflow could be applied to other human samples and datasets to help further improve human gene annotation.