Homology modelling of the lumenal domain of the human GPAA1 protein
The programs I-TASSER [33,34] and MODELLER [35,36] were utilized to construct homology-based models of the lumenal segment of human GPAA1 (UniProt: O43292). The X-ray crystal structure of glutaminyl cyclase (PDB ID:4f9u, chain A [30]; a member of the M28 metallo-peptidases) was applied as the single template. To note, 4f9u has the best segment coverage following a previously published multiple sequence alignment (see Figure 1 in reference [23]). This alignment of GPAA1 (residues 66–348) and 4f9u (residues 4–298) was also used as an input to the programs to guide the modeling process.
Consistently in both modeling efforts using I-TASSER and MODELLER, the GPAA1 structural model exhibits an α/β hydrolase fold [23], comprising 8 strands and 7 helices in a compact structure (Figure 1). The RMSD between the I-TASSER and MODELLER generated static structures is 1.33 Å, excluding the loops that cause larger deviations. Significant differences were observed at the loop 276–299 that is absent in the M28-type enzymes. Only the I-TASSER model was found to contain the expected additional small helix 𝛼x (see Supplementary Figure S1 and Figure 1 in reference [23]). As the MODELLER version of the model does not comprise the extra helix in this region, we suggest that the I-TASSER result is the possibly better model for further analyses. Therefore, subsequently, we refer to the I-TASSER-generated structure if not mentioned otherwise.
Insertion of Zn ions and of a tentative peptide (PEP) as substrate analogue
Because of the homology between the lumenal domain and the Zn-bound M28 peptidase family [23], we modeled the presence of zinc ion(s) and of a substrate analogue including a peptide bond into the catalytic cleft of the GPAA1 structural model. Sequence alignment of the five possible Zn-binding sites (see Figure 1 in reference [23]) shows that, except for site 1 and 5, protein sequences of known 3D structures share similar physicochemical properties at sites 2, 3, and 4, e.g. bearing aspartate (D) or glutamate (E) that are canonical Zn-binding residues. While the reference structure 4f9u (and also all other 144 Zn(s)-bound structures in the M28 family metallo-peptide synthetase, extracted from Protein Data Bank and tabulated in Supplementary Table S1), contain histidine residues at both sites 1 and 5, the human GPAA1 sequence holds proline (P149, a residue unable to interact with a metal ion) and tyrosine (Y328) at sites 1 and 5, respectively. Notably, tyrosine is known to coordinate metal ions [37,38].
Further in this text, GPAA1Zn stands for a single Zn-binding GPAA1 model as in 4f9u/4fwu. We found that it is sterically possible to place yet another zinc ion into the model structure (as in the aminopeptidase structure 1f2o [29,32], see also arguments below); thus, GPAA1Zn1Zn2 denotes the two Zn-binding model. The resulting GPAA1 model from I-TASSER was first subjected to the H++ server [39,40] for hydrogen addition and protonated states estimation at lumenal pH 7.2 [41]. While maintaining the protonated state at the hydroxyl group of residue Y328 (site 5) [42], we removed hydrogens (if any) that were bound to OD2 atoms of D153 (site 2) and D188 (site 3) or to OD1 of D153 (site 2) and OE2 of E226 (site 4), respectively, due to their coordination with the zinc ion(s). Then, a non-bonded model was used in simulation of the Zn-coordinated GPAA1 binding sites using weak harmonic restraints (with force constants ~3 kcal/molÅ–1).
Also, we simulated the interaction of a tentative substrate peptide (di-alanine ethylamine, named “PEP” from now on) with human GPAA1 at the Zn binding sites. PEP was designed to mimic the product of the formed peptide bond between the ω-site of the potential substrate protein and the phospho-ethanolamine moiety at the GPI lipid anchor [1,23]. For simplicity, an ethylamine group was attached to the second alanine of the PEP, which was then coordinated to one Zn ion via the carbonyl oxygen (Figure 2; compare also with Figure 2 in [23]). The PEP was parameterized (assigning partial charges and atom types) using antechamber and GAFF force field implemented in the AMBER14 package [43,44].
The resulting two models GPAA1Zn and GPAA1Zn1Zn2 with their atomic coordinates are available as structure files in the supplementary material (files GPAA1_Zn_PEP.pdb and GPAA1_Zn1Zn2_PEP.pdb).
Validation of the molecular dynamics simulation protocol
Before we can apply a molecular dynamics procedure to GPAA1 model structures, it is reasonable to elaborate a protocol that computes plausible values of desired parameters in the case of known M28-type structures. For this purpose, we first performed a molecular dynamics (MD) simulation using the AMBER14 package [45] with the force field ff14SB [46–48] to sample the conformational space of the two crystal structures 4f9u (one Zn-binding and model substrate 1-(3,4-dimethoxyphenyl)–3-[3-(1H-imidazol–1-yl)propyl]-thiourea referred to as SUB1) [30] and 1f2o (two Zn-binding and model substrate L-leucine referred to as SUB2) [29] as references. We chose to use ff14SB since it was shown to improve the helical content and side chain rotamer distributions in simulating biomolecules such as proteins [49].
The system was initially relaxed with 5000-step minimization (using steepest descent followed by conjugate gradient algorithms) and heated in gradual thermal baths from 0K–100K and then from 100K–300K in constant volume and pressure, respectively. Next, the system was equilibrated and used in the production processes applying explicit solvent model in triplicate repetitions of trajectories (3x300 ns). The simulations were carried out by assigning random velocities to the atoms constrained by the Langevin temperature equilibration scheme to stabilize the systems at 300K at time steps of 2 fs. Analyses were performed on the portions of each trajectory where the simulations obtained stable, similar fluctuations of the five canonical zinc binding sites in comparison among the triplicates. Quantitatively speaking, we required stable backbone root mean squared deviation (RMSD ≤ 2.5Å) of the five canonical Zn-binding site residues (Supplementary Figure S2). It was confirmed by the specific analyses on the reference structures that the simulation protocol reproduced the substrate positions (SUB1 and SUB2) in the Zn(s)-binding regions (e.g. in ~4 Å proximity compared to those of the starting structure) in the resulting ensembles of 4f9u (57%) and 1f2o (100%).
Then, the same protocol was applied to our 3D structural model of the lumenal domain of human GPAA1 and the two substrates SUB1 and SUB2. In both cases of the GPAA1Zn and the GPAA1Zn1Zn2 models, the procedure resulted in outputting 54% of GPAA1Zn-SUB1 and 93% of GPAA1Zn1Zn2-SUB2 ensembles with similar proximity (~4 Å) of the substrate positions in the Zn(s) binding regions compared to their reference structures, respectively.
Therefore, the simulation protocol was applied to sample the conformational space of the GPAA1Zn and GPAA1Zn1Zn2 models with the tentative peptide PEP. Also in these cases, the molecular simulations show compactness of packing as well as maintenance of secondary structure elements and hydrolase fold along the whole trajectory (Supplementary Figure S3).
Structural models of the lumenal domain of human GPAA1 can accommodate either one or two zinc ions
Geometrically, the five residues of the active sites of both the M28 families, cyclotransferases (binding one zinc ion) and aminopeptidases/carboxypeptidases (binding two zinc ions), are in similar proximities. For instance, the distance between site 1 and Zn (or Zn1 in the case of two zincs) and between site 4 and Zn (or Zn1) are ~4.6 Å and ~5.9 Å, respectively (see the schematic active sites in Figure 1B for the residue locations). With a Zn-Zn distance of 3.5 Å estimated in the aminopeptidase (e.g., in 1amp), this argument alone suggests that both the M28 family proteins should be able to sterically accommodate two zinc atoms in their active sites.
Yet, in the case of one Zn-bound structures (crystal structures of the cyclotransferase family, including the reference 4fwu [31]), it is noticed that site 2 (D99) and site 4 (D186) are bridged by a bifurcated hydrogen bond network between two adjacent residues T98 and L187, respectively (Figure 3A, left). This observation is affirmed by a previous study that strong bifurcated hydrogen bonds can be formed by threonine or serine due to over-coordination between donor and acceptor groups in protein side chains [50]. One of the bifurcated hydrogen bonds occurs directly at the oxygen atom of the site 4 (D186) where, otherwise, it would be coordinated with a zinc atom (Zn2) in the case of aminopeptidases or carboxypeptidases.
However, these bifurcated hydrogen bonds are absent in aminopeptidases and carboxypeptidase that bind 2 zinc ions (Figure 3A, right). The GPAA1Zn model structure, bearing a large distance of ~6.3 Å between its site 4 and Zn1, can thus sterically accommodate another zinc atom in the Zn-binding cleft as in two zinc ion M28-type structures. At the same time, we do not find a pair of residues in the GPAA1Zn model interacting in a similar way as T98/L187 in 4fwu. For example T155 and L227, a pair of possibly corresponding residues (Figure 3A bottom), are located distantly and, during the simulation, they remain apart and no contacts were found between them (Figure 3B). Thus, space for additional molecular entities is available. Therefore, we conclude that the GPAA1Zn model is able to spatially accommodate another metal ion, e.g. a second zinc ion that could coordinate and bridge site 2 and site 4 in this region as in the GPAA1Zn1Zn2 model generated in this work. Independent molecular dynamics simulations of the GPAA1Zn1Zn2 3D model showed that, in the presence of two zinc ions, the distance between the two residues T155 and L227 gets diminished but not to the range of T98/L187 in 4fwu (Figure 3B). Notably, the second zinc ion remained stably fixated. Therefore, it can be speculated that the GPAA1 structure might alternatively adapt to / switch between single or duo zinc ion binding modes, models of which are generated in this work (see also arguments with regard to energetics of zinc ion binding further down).
Dynamics of GPAA1 structures in the absence and presence of different substrates
For estimating the various sites’ contributions to the Zn-binding energies, we used the mmPBSA method [51] implemented in the Amber package. The energy decomposition analysis was performed using the generalized Born solvent model with ionic strength of 0.1 nM, with the non-polar solvation free energy estimated and proportional to the solvent accessible area. Meanwhile, the 1–4 electrostatics and 1–4 van der Waals were calculated and included in the internal (together with bond, angle, and dihedral) energies.
For several M28-type family proteins, Zn(s)-bound structural forms are available in the absence of substrate, e.g. glutaminyl cyclase (4fwu [31]) with one zinc or aminopeptidase (1amp [28]) with two zinc ions). This observation suggests that Zn-binding events happen prior to substrate binding and result in a structural intermediate consisting only of the enzyme protein part with the metal ions. With molecular dynamics simulations without and with substrates, we estimated the energy contributions of zinc-binding residues in cases of one- and two-zinc metallopeptidase structures. We studied the following pairs of structures:
- 4fwu and GPAA1Zn—single zinc case both having no substrate (Figure 4),
- 4f9u (identical sequence with 4fwu) with SUB1 and GPAA1Zn with PEP or SUB1—single zinc case (Figure 4),
- 1amp and GPAA1Zn1Zn2—duo zinc case both having no substrate (Figure 5),
- 1f2o with SUB2 and GPAA1Zn1Zn2 with PEP or SUB2—duo zinc case (Figure 5).
In both figures, panel A shows the per-residue energy contribution to zinc binding and panel B illustrates the position of zinc ions and of the canonical residues potentially interacting with the metal ions.
Throughout all sets of molecular dynamics simulations without model substrate (sets 1 and 3), we find that canonical site 5 tends to contribute the least to the zinc-protein interactions (Figure 4A and 5A). Similarly, site 1 (where applicable) is also not a major contributor. The trends in the energy contribution of canonical sites 2, 3, and/or 4 to zinc ion binding for the GPAA1 models are markedly similar compared to those in the reference structures (4fwu and 1amp) showing considerably lower energies than site 5 in Zn(s) ions interactions. Therefore, GPAA1 lumenal domain structures could, indeed, structurally accommodate either one or two zinc atoms in its active sites.
In the case of the single Zn-bound complex 4fwu/4f9u, the presence of substrate (trajectory set 2) was found to amplify interactions between the zinc ion and the enzyme’s canonical site residues, especially remarkably for site 5 (Figures 4). In the reference structure (4f9u, with the natural substrate SUB1), we observe a ~6 fold energy increase at site 5 (Figure 4A, right panel). Likewise, in the GPAA1Zn model, binding of the more natural model substrate PEP (but not of the substrate SUB1) results in stronger interaction (about 2x) between Zn and site 5 (Figure 4A, left panel) mostly due to contributions of van der Waals and electrostatics potential energy (Supplementary Figure S4). One may conclude, therefore, that the strength of the interaction between Zn and site 5 depends on the substrate binding in the case of single zinc metallopeptidases (Figure 4A, left panel). Depending on the substrate (the more natural PEP or SUB1 placed for comparison), we observe varying Zn-binding energies at canonical sites 2 and 3 (opposite trends) in the GPAA1Zn model (Figure 4A, left panel).
The energetics of zinc binding in the known duo zinc complexes (as exemplified by 1amp/1f2o, molecular dynamics trajectory sets 3 and 4) upon substrate binding is quite different (Figure 5). Although we see again that canonical sites 1 and 5 are of less importance in the zinc coordination, there is not much change in the energy contributions for all five sites upon substrate SUB2-binding (Figure 5A, right panel). In the case of the GPAA1Zn1Zn2 model (Figure 5A, left panel), we find a trend for strengthened interaction of zinc with sites 2 and 4 (for PEP) and with site 3 (for SUB2). Remarkably, interaction with site 5 is drastically enhanced upon substrate binding (5x for PEP, 3x for SUB2) in complete contrast with computation results for 1amp/1f2o.
Since the GPAA1 structure reveals energetics of zinc binding most similar to those of the reference glutaminyl cyclase structures 4fwu/4f9u (especially with regard on the enhancement of site 5 interaction upon substrate binding), it is tempting to conclude that the GPAA1 lumenal domain has just one zinc ion for its catalytic function in GPI-attachment. As site 1 tends to be not of major significance, the change to proline (from histidine in most M28-type enzymes) in the case of human GPAA1 does not have a dramatic functional effect. Being not essential for catalysis, the results suggest that site 1 (proline) is probably serving another role in the GPAA1 structure and/or function.
Observations of loop dynamics in the molecular simulations of GPAA1 3D models
There are four large loops in the structure of GPAA1 (see Figure 6). GPAA1 sequences contain an insertion between strand β7 and helix α6 (see Figure 1 in reference [23]; residue D276 followed by sequence WTSLDGPLQGLQTLLLMVLRQASG in human GPAA1) that is absent in other sequences of the M28 family. This stretch forms a long loop (including a small additional helix) that is located at the opening of the GPAA1 active site cleft (loop 4 in Figure 6). We observe during the molecular dynamics simulations of the GPAA1 models in both one and two zinc-bound cases that the loop 4 is preferentially in two conformational states, "closed” and "open” relative to the GPAA1 zinc location site. In particular, in the case of GPAA1Zn, the distances between the centre of mass of the loop and the Zn ion(s) are ~15.52 Å (closed) and ~24.87 Å (open).. These distances are respectively ~21.64 Å (closed) and ~44.84 Å (open) in the case of GPAA1Zn1Zn2. For comparison, the diameter of an α-helix is ~12 Å.
Interestingly, this loop 4 was found to be coupled with the motion of another loop spatially near the GPAA1 zinc location site, the loop between strands β1 and β2 (see Figure 1 in reference [23], residues 118–130 with the THERYMVSGTNVY - in human GPAA1 sequence, loop 1 in Figure 6). Noteworthy, this loop is also longer than homologous segments in other M28-type sequences (see the alignment of reference [23]). The two loop motions were found anti-correlated (calculated using Pearson correlation between the distance of the two loops to the zinc location site, resulting in the Rpearson = –0.78, p-value << 0.0001 with 95% confidence interval), forming opposite direction flaps surrounding the GPAA1Zn active site and exhibiting a breathing-like dynamics (Figure 6). The two flaps move more flexibly in the presence of the substrate PEP (blue curve in Figure 6). In the case of two Zn-bound GPAA1Zn1Zn2, the motions of the two flaps are found much more weakly and not anti-correlated (Rpearson = 0.5, p-value << 0.0001 with 95% confidence interval).
In both cases of Zn(s)-bound GPAA1 models, however, the motions of the two flaps do not affect the exposure of zinc ions at the active sites for small substrates such as the model substrate PEP. Therefore, it is inferred that the GPAA1 active site is mostly maintained and supported by the core scaffold of helices and strands also resembled in other M28-type family proteins. Yet, the real substrates of GPAA1 are the GPI lipid anchor precursor and the C-terminus of an eligible substrate protein, both being quite bulky entities. Thus, the movement of flaps consisting of loops 1 and 4 has a likely role in regulating their access to the active site and/or in accompanying the catalytic process of substrate binding and release.
Another characteristic, conserved motif in GPAA1 sequences (but not in other M28-type enzymes), the C-terminal end of strand β6, the N-terminal part of the following helix α5 and the loop in-between (see alignment in Figure 1 of [23]; sequence starting with E240 followed by GNLGXLPNLD in human GPAA1) is located distantly from the GPAA1 active site (actually, at the opposite side of the structure) and exhibits more restrained motions compared to the other loops in the GPAA1 (see loop 3 in Figure 6). Therefore, we think that this segment is not involved in the substrate binding process of GPAA1 but has another significance, most likely in transamidase complex formation. The structural modelling results are coherent with experimental results of Vainauskas et al. [52], who demonstrated that mutation of the GLNG stretch to AAAA drastically weakens but not completely excludes interactions with other transamidase subunits.
Comparison with the structural models of the lumenal domain of yeast GAA1 from Gamage et al. [14]
Our model structure has a radius of gyration Rg ~20.5 Å. This result is similar to the radius of gyration that we computed from the structure files of two yeast GAA1 models by Gamage et al. [14] supplied in their supplementary material, 20.1 Å (the one derived from 1RTQ [53,54] with Rosetta [55]) and 20.8 Å (the model derived from 4f9u [30] with RaptorX [56,57]) respectively (see Supplementary Figure S5 for structural illustrations). The RMSDs of our structural model with those two from Gamage et al. are 5.28 Å and 4.98 Å, respectively. For comparison, the RMSD between the two Gamage et al. model structures is 2.9 Å. To note, GAA1 is the yeast homologue of human GPAA1 with ~36% sequence similarity in the lumenal domain part. The structural scaffolds of the three models are very similar (especially between their 4f9u-derived and our model; see Supplementary Figure S5); the variation comes overwhelmingly from the differently arranged loops. Only in our model, the presence of the Zn ion at the catalytic cleft is modelled and the additional small helix 𝛼x is found located in the flexible loop (loop 4 in Figure 6).
Structural alignment revealed that, instead of the loop 4 expected from secondary structure predictions (Figure 1 in [23]), the Rosetta model (black in Supplementary Figure S5) contains an additional long helix. This difference has likely functional consequences. In our GPAA1 model, loop 4 surrounds the Zn-binding cleft. The large surplus helix in the Rosetta model apparently diminishes polypeptide chain flexibility and changes modulation of substrate accessibility, which is, on the other hand, accommodated by the more flexible loop in our model.
Structural analysis of the human equivalent of the yeast deletion mutant protein yGAA170–247
Saw et al. [16] using SAXS studied the structure of a truncated version of GAA1, the yeast homologue of GPAA1. The measured radius of gyration Rg ~27 Å is significantly larger than that of the full-length model of the lumenal domain of human GPAA1 generated in this work (~20 Å). Analyzing our GPAA1 3D model, we wished to rationalize (i) why the truncated mutant forms a stable structure in solution and (ii) why the gyration radius of the mutant is so much larger.
The mutant, most likely non-functional protein yGAA170–247 lacks all secondary structural elements beyond the C-terminal side of helix α5 as well as the canonical zinc-binding site 5. Saw et al. [16] found yGAA170–247 as a monomer in solution and its shape was best approximated by large elliptical volume (71 Å ×48 Å) that is connected via a short stalk (length 8 Å, average diameter 8 Å) to a smaller hook-like domain (8 Å ×35 Å). We observed a similar shape in our truncated atomistic model (Figure 7) in which the N-terminal core scaffold of helices and strands is connected to the C-terminal helix α5 (with a length of ca. 20 residues and >30 Å) via a flexible linker (residues 226–239 in human GPAA1). Indeed, the hydrophobic core was found buried in the truncated model and the fold remaining stable in a comparative molecular dynamics study with the full-length model (see Figure 7), demonstrating that the truncated GPAA1 could avoid aggregating as was found in the experiments described by Saw et al. [16].
Due to the lack of structural constraints by the rest of the scaffold, i.e. residues 262–348 present in the full length GPAA1, the linker can exhibit more flexible motions (Figure 7) as compared to that of the full length model, resulting in larger fluctuation of the C-terminal helix α5 in the truncated version. We suppose that this flexible linker and the dangling helix α5 are responsible for the more extended conformation of the truncated protein.
Comparative phylogenetic analysis of the GPAA1 family and of M28-type sequences with known structures
An HHpred [58] search of the M28 peptidase (Pfam: PF04389) seed alignment was performed against the PDB sequence database (PDB_mmCIF70_27_Apr). In addition, 3D-structures belonging to M28 family were retrieved from the PDB database using searches for annotation/description line items. By combining the resulting hits from both these approaches and manually filtering to include only the proteins having an M28 domain, a comprehensive set of 145 PDB 3D-structures belonging to the M28 family was created (annotated as 42 aminopeptidases (AM), 66 carboxypeptidases (CP) and 37 cyclases/cyclotransferases (CT); Supplementary Table S1). The selected 3D-structures were grouped into classes of the corresponding UniProt protein sequences disregarding minor mutations, variations in substrates, etc. We found that all these structures map to just 20 unique UniProt sequences (see Table 2 for a set of representative structures annotated for zinc binding). To note, the example 3gux, although without description of its Zn ion binding status, was added as it is part of the alignment in Figure 1 of ref. [23].
Structure
|
Site 1
|
Site 2
|
Site 3
|
Site 4
|
Site 5
|
Coordinated metal ions
|
Substrate
|
Ref.
|
4f9u
|
-
|
Asp99
|
Glu139
|
-
|
His265
|
Zn
|
SUB1
|
[30]
|
|
His82
|
-
|
-
|
Asp186
|
|
no 2nd ion
|
|
|
4fwu
|
-
|
Asp99
|
Glu139
|
-
|
His265
|
Zn
|
no substrate
|
[31]
|
|
His82
|
-
|
-
|
Asp186
|
-
|
no 2nd ion
|
|
|
1f2o
|
|
Asp97
|
Glu132
|
-
|
His247
|
Zn1
|
SUB2
|
[32]
|
|
His85
|
Asp97
|
-
|
Asp160
|
-
|
Zn2
|
|
|
1amp
|
|
Asp117
|
Glu152
|
-
|
His256
|
Zn1
|
No substrate
|
[28]
|
|
His97
|
Asp117
|
-
|
Asp179
|
-
|
Zn2
|
|
|
GAA1Zn
|
-
|
Asp153
|
Asp188
|
-
|
Tyr358
|
Zn
|
PEP or SUB1
|
(model / this work)
|
|
*Pro149
|
|
-
|
Glu226
|
-
|
no 2nd ion
|
|
|
GAA1Zn1Zn2
|
-
|
Asp153
|
Asp188
|
-
|
Tyr358
|
Zn1
|
PEP or SUB2
|
(model / this work)
|
|
*Pro149
|
Asp153
|
-
|
Glu226
|
-
|
Zn2
|
|
|
*Proline 84 (site 1) does not bind to a Zn ion.
|
Table 1
Summary of Zn-binding sites of structures used in the 3D structure modelling in this study.
This table lists all the amino acid residues involved in the metal ion binding as well as the type of model substrates in the X-ray crystallographic 3D (including references) and model structures used in this work. Residue numbering follows the nomenclature in the published crystal structures and, for GPAA1, in UniProt sequence entry O43292. SUB1 stands for 1-(3,4-dimethoxyphenyl)-3-[3-(1H-imidazol-1-yl)propyl]-thiourea [30] and SUB2 is the label for L-leucine [32]. PEP is described in Figure 1.
PDB ID
|
UniProt
ID
|
Annotation
|
Organism
|
Number of bound Zn ions
|
Ref.
|
1AMP
|
Q01693
|
Leucyl aminopeptidase
|
Vibrio proteolyticus (Vp)
|
2
|
[28]
|
1TKJ
|
P80561
|
Aminopeptidase
|
Streptomyces griseus (Sg)
|
2
|
n/a
|
1Z8L
|
Q04609
|
Carboxypeptidase
|
Homo sapiens (Hs)
|
2
|
[64]
|
2AFO
|
Q16769
|
Cyclotransferase
|
Homo sapiens (Hs)
|
1
|
[65]
|
3FEC
|
Q9Y3Q0
|
Carboxypeptidase
|
Homo sapiens (Hs)
|
2
|
[66]
|
3GUX
|
A6KZZ2
|
Leucine aminopeptidase
|
Bacteroides vulgatus (Bv)
|
1
|
n/a
|
3IIB
|
A1S420
|
Aminopeptidase
|
Shewanella amazonensis (Sa)
|
2
|
n/a
|
3PB4
|
Q9NXS2
|
Cyclotransferase
|
Homo sapiens (Hs)
|
1
|
[67]
|
3SI1
|
Q9CYK2
|
Cyclotransferase
|
Mus musculus (Mm)
|
1
|
[68]
|
3TC8
|
A6LHT4
|
Leucine aminopeptidase
|
Parabacteroides distasonis (Pd)
|
1
|
n/a
|
4F9U
|
Q9VRQ9
|
Cyclotransferase
|
Drosophila melanogaster (Dm)
|
1
|
[30]
|
4FAI
|
Q86PD7
|
Isoglutaminyl cyclase
|
Drosophila melanogaster (Dm)
|
1
|
[30]
|
4FUU
|
Q8A4P9
|
Leucine aminopeptidase
|
Bacteroides thetaiotaomicron (Bt)
|
1
|
n/a
|
4MHN
|
B7QK46
|
Cyclotransferase
|
Ixodes scapularis (Is)
|
1
|
n/a
|
4TWE
|
Q9UQQ1
|
Aminopeptidase
|
Homo sapiens (Hs)
|
2
|
[69]
|
5GNE
|
Q5ZZH8
|
Leucine aminopeptidase
|
Legionella pneumophila (Lp)
|
2
|
[70]
|
5IB9
|
A2V759
|
Aminopeptidase
|
Aneurinibacillus sp. AM-1 (Ab)
|
2
|
n/a
|
6ESL
|
Q5ZRR6
|
Aminopeptidase
|
Legionella pneumophila (Lp)
|
2
|
[71]
|
6HC6
|
P25152
|
Aminopeptidase
|
Bacillus subtilis (Bs)
|
2
|
n/a
|
6QQL
|
Q7MT37
|
Glutamine cyclotransferase-related protein
|
Porphyromonas gingivalis (Pg)
|
1
|
n/a
|
Table 2
Number of experimentally proven zinc ions per protein molecule located in structures used in the phylogenetic tree analysis in this study.
This table lists all the PDB structure codes with annotations, literature references (n/a - not applicable if none is available) and number of Zinc ions bound per protein molecule.
A multiple sequence alignment of the selected 20 M28-type sequences with known structures, along with six GPAA1/GAA1 sequences, was created using inputs from HHpred [58] and MUSCLE [59,60] and was subsequently adjusted manually. MEGA X [61] was used to infer the evolutionary history by using the Maximum Likelihood method and JTT matrix-based model with 1000 bootstrap replicates [62].
The phylogenetic tree as shown in Figure 8 shows a clear dichotomy for structures reported as one (1Zn) and two (2Zn) Zn ion binding M28-type peptidase family members. The branch consisting of cyclotransferases/cyclases (marked with green circles) that bind a single zinc ion contain sequences belonging to arthropods (4mhn/B7QK46, 4f9u/Q9VRQ9, and 4fai/Q86PD7), bacteria (6qql/Q7MT37—cyclotransferase type I, CTI) and mammals (3pb4/Q9NXS2, 2afo/Q16769, and 3si1/Q9CYK2—cyclotransferases type II, CTII).
In addition, three examples annotated as leucine aminopeptidases (3tc8/A6LHT4, 3gux/A6KZZ2, and 4fuu/Q8A4P9) also cluster in the same clade. There is no publication available for any of these three structures. In order to verify whether this clustering is correct and to exclude possible annotation errors in the sequence database, we did a BLAST search for all the proteins without verifiable molecular function/enzymatic activity (via experimental evidence reported in the scientific literature) against the non-redundant protein sequence database. For example, 3tc8/A6LHT4 (leucine aminopeptidase) shows high identity to both arginyl aminopeptidase (WP_057326189.1, E-value = 0 and PI = 99.7%), and glutamine cyclotransferase (OKY96427.1, E-value = 0 and PI = 99.7%). Similarly, 3gux/A6KZZ2 (leucine aminopeptidase) hits the glutamine cyclotransferase (RHJ01572.1, E-value = 0 and PI = 99.7%) as the best hit. Furthermore, 4fuu/Q8A4P9 (leucine aminopeptidase) shares high sequence identity (>99%) with both leucine aminopeptidase (CUP85417.1, E-value = 0, PI = 99.7%) and glutamine cyclotransferase-related protein (EFI02281.1, E-value = 0 and PI = 99.7%). Similar results were observed for 6qql/Q7MT37 (glutamine cyclotransferase). This suggests that some of the available annotations in the databases are dubious and, therefore, need experimental validation for better function assignment. This will add more information to the current pattern of leucine aminopeptidases clustering with cyclotransferases, as observed in the tree.
The other branch consists of members that bind two Zn ions (2Zn), marked with red circles, which include two carboxypeptidases (3fec/Q9Y3Q0 and 1z8l/Q04609) and eight aminopeptidases (5ib9/A2V759, 6hc6/P25152, 3fec/Q9UQQ1, 3iib/A1S420, 1tkj/P80561, 1amp/Q01693, 5gne/Q5ZZH8, and 6esl/Q5ZRR6). At the same time, GPAA1/GAA1 family members, marked with blue circles, cluster as an outgroup and, consequently, are directly related to the common ancestor of both 1Zn- and 2Zn-binding M28 structures. The protein 4f9u/Q9VRQ9 is closest to the tree origin among M28 type structures and, thus, not surprisingly the best/closest homologue. We also show the position of 3gux/A6KZZ2 in the tree although it is not known how many zinc ions it does bind in the catalytic cleft (we predict one) as it was part of the alignment in Figure 1 of reference [23].