As expected, a wide range of proteoform molecular masses could be seen in the pre-fractionated intracellular proteins, but GELFrEE was efficient in separating large proteoforms from smaller forms (Fig. 1A). Fractions 0, 1, 2, 3, 4, 5, 7, 9 and 11 displayed proteoforms below 50 kDa, and for this reason were chosen for subsequent LC-MS/MS analysis. In spite of the clear presence of proteins in the 30–50 kDa range, as shown by SDS-PAGE (Fig. 1A), LC-MS analysis detected proteoforms predominantly below 30 kDa (supplementary figure S1). The reduced number of observed proteoforms larger than 30 kDa may be due to known challenges for the identification of denatured large proteoforms such as the signal-to-noise ratio reduction according to the increase in the proteoform molecular weight18.
Global post-translational modifications profile
The top-down proteomic analyses of C. glutamicum generated 5127 proteoform spectrum matches (PrSMs), providing the identification of 1125 different proteoforms related to 273 different proteins (supplementary table 1). Approximately 65% (177 proteins) of the total protein number and 47% of PrSMs (2423 PrSMs) were identified with mass shifts (Δm), mass differences between the expected precursor mass and observed precursor mass, which suggests the presence of PTMs. Analysis of all PrSMs with Δm revealed a broad diversity of Δm, suggesting the presence of different PTMs (Fig. 1B).
Some of the most frequent Δm identified here were also reported as highly present in Escherichia coli through bottom-up proteomics using an unbiased identification strategy19. In C. glutamicum, the mass shifts most frequently identified were 16 Da (putative oxidation, 10.15% of modified PrSMs), -18 Da (putative dehydration, 2.97% of modified PrSMs), and 32 Da (possible double oxidation, 1.32% of modified PrSMs). Interestingly, the proportion of E. coli identified peptide spectrum matches (PSMs) with − 18 Da (3.8%) and 32 Da (0.8%) Δm were very similar to those found in this study. In contrast, the proportion of PSMs identified with 16 Da (24%) Δm in E. coli was greater than in C. glutamicum. However, it is worth mentioning that the high number of PrSMs identified with 15 Da Δm (6.11% of modified PrSMs) may be caused by miss identification of the 16 Da Δm, as will be further discussed in the next topics.
Proteins identified with Δm are related to different gene ontology terms such as protein-containing complexes, electron transfer activity and CYTH domain (CyaB, thiamine triphosphatase), pyrimidine metabolism, transmembrane helix, biosynthesis of secondary metabolites and thioredoxin domain. Moreover, eight ribosomal PrSMs presented two Δm, indicating the presence of at least two concomitantly occurring PTMs (supplementary figure S2).
Beyond functional annotation, we analyzed the number of N-terminal acetylation and most frequently identified Δm: 16 Da, 15 Da, 28 Da, 266 Da, and − 18 Da (Fig. 2A). These Δm suggest the presence of oxidation (UnimodAC: 35, Δm = 15.994915), deamidation followed by a methylation (UnimodAC: 528, Δm = 14.999666), formylation (UnimodAC: 122, Δm = 27.994915), sodium dodecyl sulfate (SDS) adducts20 and dehydration (UnimodAC: 23, Δm = -18.010565). The 28 Da mass shift may also correspond to di-methylation (Unimod: 36, Δm = 28.031300). Despite being difficult to differentiate between formylation and di-methylation, the higher number of Δm close to 27.99 Da and 27.98 Da (supplementary table 2) leans the inference towards formylation events. The suggested oxidation events related to the 16 Da Δm can be further evidenced by a large number of methionine residues modified by this mass shift (Fig. 2B). On the other hand, the 15 Da Δm seems to be sometimes misinterpreted, since the most frequently identified residue with this modification was also methionine. However, some glutamic acid residues were identified with this Δm, suggesting that deamidation followed by methylation can still be present in some cases, but are easily confused with oxidations (Fig. 2B). Moreover, several Δm were approximately 266 Da, suggesting the presence of adducts resulting from sodium dodecyl sulfate (SDS). Regarding the − 18 Da Δm, it may result from dehydration events and was predominantly identified in serine residues (Fig. 2B).
Overrepresentation analysis of proteins belonging to the six most frequent modifications allowed the identification of terms related to ribosomal proteins, predominant in proteins with Δm of -18 Da, 15 Da, 16 Da, and N-terminal acetylated proteins (Fig. 3). Prokaryotic ribosomal proteins are described to be commonly methylated and acetylated21. On the other hand, Δm of 28 Da and 266 Da were overrepresented for terms related to membrane proteins. In addition, proteins identified with 28 Da mass shift showed overrepresentation for the secretion system and protein export (Fig. 3).
We identified 13 proteins that presented more than 15 proteoforms and most of them were ribosomal (supplementary figure S3). Moreover, when considering proteins with more than 9 proteoforms, the cellular component annotation also revealed membrane proteins (supplementary figure S3). In despite of the great number of proteoforms, it is worth mentioning that some of them may be boosted by SDS adducts. For example, 50S ribosomal protein L7/L12 (Q8NT28) proteoforms were identified with Δm of 266 Da, 532 Da, and 401 Da, probably representing the addition of 1 SDS adduct, 2 SDS adducts, and 2 SDS adducts plus the loss of the initiator methionine residue. The presence of SDS adducts in proteins is not desirable, however, the identification of these proteins is important to avoid misidentifications and to assign proteins that were only identifiable through adducted forms22.
In order to contribute with more accurate data regarding C. glutamicum biotechnological and metabolic processes, some PrSMs related or potentially involved with these functions were manually characterized through inspection of MS1 and MS2 Spectra (Table 1).
Table 1
Proteoforms with potential metabolism regulation function and of biotechnological interest. Proteoforms that presented a potential involvement in the regulation of metabolism or important biotechnological processes are represented by their corresponding protein names, Uniprot Accession code (AC) and identified mass shift or sequence cleavage. Putative post-translational modifications (PTM) were defined based on identified mass shifts using the unimod database (www.unimod.org).
Uniprot AC | Protein name | Biological Process | Proteoforms | Putative PTM |
Q8Z469 | SecG | Protein export | 28 Da | N-formylation |
Q8NS24 | mepB | Peptidase/cell wall metabolism | N-terminal cleavage | N-terminal cleavage |
Q8NQJ3 | OdhI | Glutamate production | 70 Da/N-terminal cleavage | Crotonaldehyde lipid peroxidation or five methylations |
Q8NL68 | HMADP | Glutamate production/stress response | 30 Da/-2 Da | Methylation + oxidation/disulfide bond or Val->Pro, or Met->Glu or didehydro |
Q8NMS6 | Peroxiredoxin | Stress response | 154 Da/186 Da | ONE lipid peroxidation/oxidation |
Q8NLG6 | Thioredoxin | Stress response | 30 Da/-2 Da | Methylation + oxidation/disulfide bond |
Membrane proteins
Secretion system protein (SecG) (Fig. 4A) and large-conductance mechanosensitive channel (mscL) (supplementary figure S4) were identified with 28 Da mass shifts. The PrSMs fragments of these proteoforms presented ions supporting the identification of a mass shift of 28 Da, always near the N-terminus, and the mass errors of fragments containing this Δm were extremely low. An example of MS1 and MS2 inspection is demonstrated with SecG (Q8Z469) (Fig. 4A). In this figure, it is shown the large number of b-ions supporting the 27.9883 mass shift (b4, b5, b6, b7, b8, b9, b10, b11 and b12) represented by the blue lines in the protein sequence, where it can be seen that the mass shift localization is very narrowed to three amino acid residues, due to the several b-ions contained in this region.
The 28 Da mass shift near the protein N-terminus of these proteins suggests the presence of N-terminal formylation (Unimod: 122, Δm = 27.994915) in the identified proteoforms. Recent evidence has suggested the N-terminal formylation of methionine as a signal for protein degradation. This mechanism has been hypothesized as a quality control of protein translation in bacteria23. Both the SecG and MscL formylated PrSMs mentioned above were identified with the presence of N-terminal methionine. This suggests a possible mechanism of degradation of membrane proteins, including some with promising biotechnological applications. For example, the SecG protein is part of the Sec protein export pathway. There is a growing interest in the capacity of C. glutamicum to express and secrete heterologous proteins of biotechnological interest4,24. Furthermore, MscL and mscS are mechanosensitive channels, known for reacting to osmotic stress. Another mechanosensitive channel of C. glutamicum (MscCG; P42531) plays a major role in L-glutamate efflux25.
Top-down proteomics is efficient in identifying cleaved proteoforms. For instance, mepB, a membrane protein related to the metalloendopeptidase (Q8NS24) was identified by a portion of its sequence with precursor mass of 14.464 kDa. This mass corresponds to the loss of 97 amino acid residues from its N-terminal region (Fig. 4B). In contrast, a mepB cleavage site was supposed to be between Ala43 and Ala44, after a putative signal peptide or transmembrane helix26. Furthermore, according to pfam (https://pfam.xfam.org/), mepB has a domain belonging to the M23 metallopeptidase family (MEROPS) [130–226]. A well characterized member of MEROPS is the LytM protein from Staphylococcus aureus, a metallopeptidase involved in autolysis27. It was demonstrated that cleavage in the N-terminal chain causes its peptidase activity to be activated28. Moreover, MEROPS proteins have specificity to peptidoglycan polyglycine regions, some with suggested cell wall metabolism activity29. In agreement, the C. glutamicum mepB gene was described as part of the MtrAB regulon, a two component system implicated in osmoregulation and cell wall metabolism control30. Although it is not clear whether mepB is secreted or membrane bound, it was supposed that its activity would occur extracitoplasmically26. Considering these, C. glutamicum mepB may have an important role in cell wall metabolism and/or heterologous protein secretion integrity, and its activity is likely regulated by cleavage of its N-terminal.
Tricarboxylic acid cycle and glutamate metabolism
C. glutamicum is widely used in the industrial production of amino acids, especially L-glutamate, which is produced in millions of tons per year31. The tricarboxylic acid (TCA) cycle is an important step to L-glutamate production by C. glutamicum. It is well established the reduction of 2-oxoglutarate dehydrogenase complex (ODHC) activity occurs in conditions that induce L-glutamate production32,33. Furthermore, ODHC activity was found to be regulated by the phosphorylation status of oxoglutarate dehydrogenase inhibitor (OdhI, Q8NQJ3). Thus, the phosphorylated proteoform of OdhI is unable to interact with ODHC, whilst, unphosphorylated OdhI interacts with ODHC inhibiting 2-oxoglutarate conversion to succinyl-CoA11 (Fig. 5A). Moreover, the decreased ODHC activity causes an enhanced production of L-glutamate from 2-oxoglutarate33 (Fig. 5A). Congruently, OdhI phosphorylation status is affected by methods that induce L-glutamate production11.
In this study, seven proteoforms of OdhI were identified. Some of them seemed to be caused by sample preparation or ionization process, such as oxidation (Δm = 16 Da) and dehydration (Δm = -18 Da). Conversely, two proteoforms were identified with potential biological relevance: one of them with N-terminal cleavage of three residues and another one with Δm of 70 Da (supplementary figure S5). Although these two proteoforms were confidently identified by TopPic, a low number of matching fragments covered the modified regions, lowering our confidence regarding their identifications, principally concerning their location in the sequence. On the other hand, in agreement with one of the identified proteoforms, a bottom-up proteomic analysis in a preliminary study conducted by our group suggested the presence of Δm of 70 Da in the N-terminal peptide of OdhI (data not shown).
Recently, the 70 Da mass shift was identified in the GM-CSF heterologous expressed protein in Escherichia coli system. This modification was hypothesized to be a result of crotonaldehyde formed during oxidative stress by lipid peroxidation. The aldehyde reacts with the protein N-terminus or lysine residues, resulting in the Δm of 70 Da34. In the present study, the exact site of the modification resulting in the 70 Da mass shift in OdhI could not be identified due to the complexity of the spectra and limited fragmentation by MS/MS. Therefore, other modifications such as 5 methyl group additions (14 Da) and, acetylation (42 Da) followed by formylation (28 Da) could not be excluded. As aforementioned, ODHC inactivation arises from the interaction of unphosphorylated OdhI and the OdhA subunit of ODHC. The T14 OdhI phosphorylation inhibits this interaction, resulting in ODHC activation 10. More recently, it was reported that K142 succinylation also affects the OdhI-ODHC interaction, hampering the inhibition of ODHC with impacts on C. glutamicum L-glutamate production12,35.
Furthermore, as an acetylation site at K52 of OdhI was also described12,35, we investigated such a modification in addition to N-terminal formylation (27.9949 Da + 42.0106 Da) as a possible cause of production of the 70 Da OdhI proteoform. However, these modifications resulted in a considerable loss of matched fragment peaks (data not shown). Therefore, it is improbable that this was the source of such a mass shift. Moreover, the suggested region for the 70 Da Δm is near to the known T14 phosphorylation site of OdhI, consequently raising the question if it could affect its interaction with ODHC (Fig. 5B). Beyond the 70 Da mass shift, the OdhI proteoform with N-terminal tripeptide truncation also presents a possible mechanism of OdhI regulation. OdhI phosphorylation occurs at T14, and is mainly catalyzed by C. glutamicum PknG10. Considering the close location of the OdhI phosphorylation site with the cleaved identified peptide in the truncated proteoform, and the 70 Da Δm proteoform, these modifications may affect OdhI inhibition of ODHC, resulting in the activation of ODHC (Fig. 5B, purple box). A possible mechanism is the inactivation of OdhI after N-terminal peptide cleavage or 70 Da PTM, hampering its interaction with ODHC. Another hypothesis is that these modifications may induce the inhibition of ODHC in a similar way that was proposed by the phosphorylation (Fig. 5B, green box). Furthermore, these hypotheses may influence L-glutamate production in these bacteria, as a consequence of ODHC regulation (Fig. 5B). Despite these possible relevant effects of OdhI putative proteoforms, further studies must be performed to investigate the importance and validity of these modifications in OdhI function.
Another protein with relevance to the glutamate production process is the heavy-metal-associated domain (HMA) containing protein (Q8NL68, HMADP). Its corresponding transcript was identified as up-regulated in a series of glutamate overproduction conditions, however its function in this process remains unclear36. Here we identified three different proteoforms of this protein. One of them presented Δm of -2 Da (supplementary figure S6), suggesting the presence of a disulfide bond (UnimodAC: 2020, Δm = -2.015650 Da), amino acid residue substitution (UnimodAC: 1217, Δm = -2.015650, Val->Pro or UnimodAC: 1145, Δm = -1.997892, Met->Glu) or didehydro (UnimodAC: 401, Δm = -2.015650) in the modified region. Moreover, several PrSMs were identified with Δm of approx. 30 Da. Inspection of its spectra resulted in the confirmation of this mass shift, but it is more likely that it resulted from two PTMs, one of 14 Da and other of 16 Da (supplementary figure S6), possibly corresponding to methylation and oxidation, respectively. Furthermore, an isotopic envelope could be detected near this identified proteoform (Δm = 30 Da) with the intact mass of the − 2 Da modification of HMADP (supplementary figure S6, precursor in blue).
The two putative modifications of HMADP proteoforms are close together in the protein sequence (supplementary figure S6). Moreover, no canonical proteoform could be identified. These suggest that the presence of methylation may affect disulfide bond formation. The HMA sequence is found in bacterial proteins conferring toxic heavy metal resistance37. To clarify C. glutamicum HMADP function, the Q8NL68 protein sequence was blasted against the UniprotKB reference proteomes plus Swiss-Prot database with default parameters in Uniprot, resulting in similarity to copper chaperone and heavy metal transport/detoxification proteins (supplementary table 3). The transcripts for the copper-responsive two-component system, CopRS, were shown to be up-regulated in C. glutamicum during penicillin induction of glutamic acid production38. Recently, it was demonstrated that copper can induce glutamic acid production by C. glutamicum, however in a lower amount compared to typical treatments, such as penicillin or biotin limitation39. Despite this, the role of HMADP in L-glutamate production remains unclear, however it appears to be a stress response related protein, and it may therefore be regulated or regulate other proteins through PTMs associated with oxidative stress.
Stress response related proteins
Less common Δm values were observed in a peroxiredoxin (Q8NMS6). One proteoform with a mass shift of 154 Da was identified and N-terminal acetylation was also detected (supplementary figure S7). Inspection of the MS/MS spectrum allowed us to identify fragments that support the 154 Da mass shift, however, the N-acetylation could not be narrowed down to a shorter sequence region, making it unclear if it was a series of methylation events or indeed an acetylation (supplementary figure S7). Considering only one PTM, the mass shift of 154 Da may be caused by three events: addition of glycerophosphate (UnimodAC: 419, Δm = 154.003110 Da), decanoyl (UnimodAC: 449, Δm = 154.135765 Da) or 4-oxo-2-nonenal (ONE) (UnimodAC: 721, Δm = 154.099380 Da). The ONE protein modification is a lipid peroxidation product, caused by reactive species interactions with membrane lipids. This molecule can be added to nucleophile amino acid residues40. Observing the proposed region of modification, it is likely that this ONE modification would occur in the R54 or D55 of this peroxiredoxin. Moreover, proteins involved in oxidative stress regulation are often susceptible to oxidative modifications as a process to regulate redox activity in the cell41. This evidence supports the 154 Da mass shift as a ONE modification in C. glutamicum peroxiredoxin. However, other possibilities cannot be discarded. Another peroxiredoxin proteoform with a Δm of 186 Da supports the identification of the 154 mass shift in this protein and is likely caused by a second PTM in its sequence of approximately 32 Da. In agreement, near the precursor fragmented for the identification of the 186 Da Δm proteoform, there was an isotopic envelope corresponding to the loss of around 32 Da (supplementary figure S7). This suggests the presence of a PTM of 32 Da in addition to the putative ONE modification. Mass differences of 32 Da are usually due to two oxidations in different methionines, but the methionines in this protein sequence are closer to the C-terminal, where several fragments were identified without this mass shift. Another possibility for this mass shift would be a dihydroxylation of cysteine (UnimodAC: 425, Δm = 31.989829 Da). There are two cysteines near the modified region (supplementary figure S7). Despite the pKa of free cysteine being around 8.6, the presence of positively charged residues near the cysteine decreases it by 3–4 units, supporting the oxidation of its thiol group42. In agreement, there is an arginine (R54) and lysine (K47) near C51. Two oxidations of cysteines result in the formation of sulfinic acid, which is an irreversible modification and signal for protein degradation42. Therefore, there is a good chance that the peroxiredoxin protein of C. glutamicum undergoes oxidative modifications which could regulate its activity and integrity.
Another thioredoxin (Q8NLG6, trxB1), described as thiol-disulfide isomerase and thioredoxins, was identified by two proteoforms, both with N-terminal cleavage of 26 amino acid residues. Interestingly, one proteoform was identified with approximately − 2 Da mass shift in a region near two cysteines (supplementary figure S8), suggesting the formation of a disulfide bond. Another proteoform was identified with a Δm of 29.88 Da with ambiguous possibility of modifications (supplementary figure S8), such as two oxidations plus a disulfide bond, or methylation followed by oxidation. Moreover, both proteoforms were identified by good quality MS/MS spectra with several fragments representing the two modified forms (supplementary figure S8). This thioredoxin was demonstrated to be responsive to disulfide stress, regulated by the SigM sigma factor in C. glutamicum43. Thioredoxins are known to act as a repair system of oxidized cysteine residues, through its CxxC motif. Its cysteine residues undergo oxidation, forming a disulfide bond, producing the reduced form of cysteine residues in the target protein42. The presence of a -2 Da Δm identified near this motif of trxB1 reinforces that it is a disulfide bond. Moreover, this evidence supports trxB1 function in the C. glutamicum disulfide stress response.
Top-down proteomic analysis of the industrial workhorse Corynebacterium glutamicum revealed several new putative PTMs of this bacterium related to different biological processes. More precisely, 1125 proteoforms were identified, from 273 proteins. Moreover, membrane proteins and proteins involved in translation seem to be heavily susceptible to PTMs. Proteins relevant to biotechnological and metabolic processes such as the OdhI, involved in amino acid production, the mepB and SecG proteins, implicated in the protein secretion system, and the thioredoxin and peroxiredoxin, participating in stress responses were identified by new proteoforms, which may imply new regulation strategies. Although these proteoforms present great potential to influence biotechnological processes, further studies should be performed to validate and better comprehend their functions in C. glutamicum biology.