Background: In proteomics, mass spectra representing peptides carrying multiple unknown modifications are particularly difficult to interpret. This issue results in a large number of unidentified spectra.
Methods: We developed SpecGlob, a dynamic programming algorithm that aligns pairs of spectra – each pair given by a Peptide-Spectrum Match (PSM) – provided by any Open Modification Search (OMS) method. For each PSM, SpecGlob computes the best alignment according to a given score system, interpreting the mass delta within the PSM as one or several unspecified modification(s). All the alignments are provided in a file, using a specific syntax. These alignments are then post-processed by an additional algorithm, which aims at interpreting the detected modifications.
Results: Using a large collection of theoretical spectra generated from the human proteome, we demonstrate that running SpecGlob as a post-analysis of an OMS method can significantly increase the number of correctly interpreted spectra, since SpecGlob is able to infer several, and possibly many, modifications. The post-processing algorithm is able to interpret unambiguously most of the modifications detected by SpecGlob in PSMs. In addition, we performed an extensive analysis to provide insight into the potential reasons for incomplete or erroneous interpretations that may remain after alignments of PSMs.
Conclusion: SpecGlob is able to correctly align spectra that differ by one or more modification(s) without any a priori. Since SpecGlob explores all possible alignments that may explain the mass delta within a PSM, it reduces interpretation errors generated by incorrect assumptions about the modifications present in the sample or the number and the specificity of modifications carried by peptides. Our results demonstrate that SpecGlob should be relevant to align experimental spectra, even if this consists in a more challenging task.