Construction and production
Previously, we have successfully produced large and complex disulfide-bond containing ECM proteins using CyDisCo (cytoplasmic disulfide bond formation in E. coli) technology [25]. This encouraged us to attempt to produce full-length mature fibulin-2 (V27-P1221). Unfortunately, the soluble production of the full-length construct was not successful in this system and therefore smaller constructs were made (Table 1), to try to identify the limiting factor(s) for their production.
Table 1
Plasmids expressing constructs of mouse fibulin-2 used in this study. Numbering is according to the full-length protein. Fibulin-2 constructs which were solubly produced, purified and characterized are marked (*).
Domain Boundaries | Plasmids | Details |
V27-G545 | pAS85 | N-domain and three anaphylatoxin-like modules |
V27-D593 | pAS87 | N-domain and three anaphylatoxin-like modules with extension |
V27-P1221 | pAS86 | Full-length mature fibulin-2 protein |
S427-G545* | pAS79 | Three anaphylatoxin-like modules (wild-type) |
S427-G545* | pAS84 | Three anaphylatoxin-like modules (mutant C500L) |
L590-P1221 | pAS88 | Eleven EGF-like domains and Domain III |
G592-Q710* | pAS71 | EGF-like domains; 1 and 2, and unassigned sequence |
D709-V802* | pAS69 | EGF-like domains; 3 and 4 |
T800-V896* | pAS70 | EGF-like domains; 5 and 6 |
V894-V981* | pAS75 | EGF-like domains; 7 and 8 |
E979-L1063* | pAS76 | EGF-like domains; 9 and 10 |
K1061-P1221* | pAS77 | EGF-like domain 11 and Domain III |
All constructs which included the N-domain of fibulin-2 (V27-G545, V27-D593 and V27-P1221) did not make soluble protein. AlphaFold prediction [26, 27] suggests that the Nb sub-domain (H177-T434) of fibulin-2 is unstructured and disordered (supplementary Fig. S1a). From this we hypothesize that this region may interact with some other ECM protein(s) and that such an interaction may be required to stabilize the structure of the N-domain and hence allow soluble production. Constructs containing EGF-like domains were partially solubly expressed, resulting in purified yields in the range 0.1-1.0 mg/L. These production levels are significantly lower than the levels we have observed for other EGF-like containing ECM proteins such as region 3 of perlecan [25]. These differences in protein yield support the idea that protein expression in E. coli may be highly dependent on the exact nature of protein of interest and/or dependent on the nature of inter-domain packing in the native protein. In contrast to the low yields of other constructs, the fibulin-2 construct S427-G545 (wild-type and C500L mutant), which has three anaphylatoxin-like modules, was fully solubly produced and was purified in good yields (> 10 mg/L). This construct is predicted to be a disulfide linked homodimer [10] with a molecular weight of ~ 27.6 kDa and a total of 34 cysteines forming 17 disulfide bonds. Overall, the seven constructs (shown in Table 1) that could be produced as (partly or wholly) soluble proteins covered 62% of fibulin-2, and included 79.3% of the total disulfide bonds.
Biophysical studies
The apparent molecular weights of all the fibulin-2 purified constructs were analyzed using sodium dodecyl sulfate - polyacrylamide gel electrophoresis (SDS-PAGE). Both the reduced and N-ethylmaleimide (NEM) treated non-reduced purified samples were run on 15% SDS-PAGE gels (Fig. 2). All the EGF-like domain containing constructs, except for E979-L1063, showed a single band for both reduced and non-reduced conditions (Fig. 2a). The presence of multiple bands for E979-L1063 suggests that the construct is prone to degradation and/or modification upon storage in SDS loading buffer as samples run immediately after purification showed a single band. All the bands (except the degradation products of E979-L1063) ran at their expected molecular size in reducing SDS-PAGE and the single band observed in the non-reduced NEM treated samples implies that the constructs all have a single redox state. For the anaphylatoxin-like modules containing construct (S427-G545), a single band near ~ 16 kDa could be seen for the wild-type protein in the reduced state, with a shift in mobility in non-reduced SDS-PAGE indicative of inter-molecular disulfide-based dimerization (Fig. 2b). The apparent molecular weight for the dimer is not twice that of the monomer, probably due to the predicted intramolecular disulfides in each subunit. To confirm this, the C500L mutation was made. This ran at the same position in reducing SDS-PAGE, but at a lower molecular weight in non-reducing SDS-PAGE, consistent with it having only intra-molecular disulfides (Fig. 2b). This implies the C500L mutation disrupts the formation of the inter-molecular disulfide bond in the dimer and that the wild-type protein is all in a disulfide linked homodimer state.
To validate the proteins made and to examine their redox states, the determination of exact molecular weight of the fibulin-2 constructs was done by mass spectrometry (MS) (Table 2). The MS results confirmed that the purified fibulin-2 constructs have the expected molecular weight with all the cysteines present in the constructs being in disulfide bonds. No significant protein adducts having an additional 125 Da molecular weight (or multiple thereof) was seen for any of the NEM-treated samples. This further implies that all the cysteines in the constructs are involved in disulfide bonds. The molecular weight of the C500L mutant and wild-type of the fibulin-2 construct containing three anaphylatoxin-like modules (S427-G545) indicated the monomeric and disulfide-linked dimeric states, respectively.
Table 2
Molecular weight analysis by mass spectrometry for purified fibulin-2 constructs. The theoretical average molecular weight (MWtheor) was calculated using ExPASy ProtParam tool [28] and compared with the experimentally determined average molecular weight (MWexp) of the purified NEM treated fibulin-2 constructs. A reduction of 2 daltons (Da) in the MWtheor is expected for the formation of each disulfide bond. MWΔ is the difference between MWtheor and MWexp.
Construct | No. of cys | Disulfide bonds | MWtheor (Da) | MWexp (Da) | MWΔ (Da) |
MH6M-S427-G545 (monomer) MH6M-S427-G545 (dimer) MH6M-S427-G545 (C500L) | 17 34 16 | - 17 8 | 13816 27632 13826 | - 27598 13810 | - 34 16 |
MH6M-G592-Q710 | 12 | 6 | 13827 | 13815 | 12 |
MH6M-D709-V802 | 12 | 6 | 11575 | 11563 | 12 |
MH6M-T800-V896 | 12 | 6 | 11660 | 11649 | 12 |
MH6M-V894-V981 | 12 | 6 | 10710 | 10697 | 12 |
MH6M-E979-L1063 | 12 | 6 | 10515 | 10503 | 12 |
MH6M-K1061-P1221 | 8 | 4 | 19254 | 19246 | 8 |
SEC-MALS analysis was then done, to further investigate the oligomeric states of the S427-G545 constructs (wild-type and C500L mutant). This analysis showed that both the wild-type and C500L mutant eluted in the same volume in SEC. They had an apparent molecular weight of 26.3 kDa and 25.7 kDa (wild-type and C500L mutant, respectively) according to MALS, indicating a dimeric state of both proteins (supplementary Fig. S2). This is in agreement with the study by Sasaki and co-workers [10], suggesting that the inter-molecular disulfide bond via C500 is not critical for dimer formation.
Thermal stability was then examined for all constructs (Fig. 3). There was insignificant change in signal over the temperature range 20–90 ˚C for all of the constructs except for G592-Q710 and K1061-P1221. This suggests the constructs are extremely thermostable, which is consistent with them having multiple disulfide bonds (Table 2). The lower thermal transition shift observed for G592-Q710, could be due to the presence of an unstructured region between the two EGF folds (supplementary Fig. S1c). The thermal stability for the K1061-P1221 construct is relatively lower than other constructs having a single thermal transition shift at 60 ˚C. This could be due to the presence of Domain III, which comprises of 70% of the construct. Domain III has only a single disulfide bond (C1110-C1116) and hence might be expected to be less thermally stable.
The secondary structure of the protein constructs was examined using far-ultraviolet circular dichroism (CD) spectroscopy. The construct S427-G545, containing only the anaphylatoxin-like modules, is predicted to be predominantly α-helical (supplementary Fig. S1b). The CD spectra for S427-G545 for both wild-type and the C500L mutant was consistent with the predicted structural information, showing a positive peak near 193 nm and negative peaks near 208 nm and 222 nm (Fig. 4a-b). The CD spectral data for most of the fibulin-2 constructs containing two EGF-like domains, (D709-V802, T800-V896, V894-V981 and E979-L1063) expect G592-Q710, showed a sharp negative peak near 195 nm. This indicates that these constructs lack significant regular secondary structure components (Fig. 4c-g) which agrees with the AlphaFold predicted structures (supplementary Fig. S1c). In contrast, the CD spectra for construct K1061-P1221, which has a single EGF-like domain and Domain III, showed a positive peak near 190–195 nm and a negative peak near 210–220 nm (Fig. 4h) which suggests the presence of significant amounts of antiparallel β-pleated sheets. This agrees with the predicted structure of the C-terminal Domain III (supplementary Fig. S1d). Hence all fibulin-2 constructs exhibited CD spectra consistent with their predicted structure, which, when combined with the MS data that showed all cysteines are in disulfides, suggests that all are natively folded.
To further investigate the thermal stability of these constructs, changes in secondary structural elements were examined using CD spectrometry. The anaphylatoxin-like modules containing construct S427-G545, wild-type and C500L mutant, showed no or very minor change in CD spectra (Fig. 4a-b) which indicated that they are highly thermostable. In contrast most of the EGF-like domain containing constructs showed apparent conformational changes with a shift in the position of the negative peak to higher wavelengths at higher temperatures. The transition temperature was above 60 ˚C for most of the constructs and showed a single thermal transitional shift (Fig. 4c-g). A more significant change in CD spectra can be seen for K1061-P1221 between room temperature and high temperature, which is consistent with the thermofluor data (Fig. 3 and Fig. 4h). When the pre-heated samples (at 90 ˚C) were cooled to room temperature, the CD spectra for all except K1061-P1221 constructs shifted back to their native room-temperature state (supplementary Fig. S3). This efficient “refolding” could either be due to this being a conformational change rather than denaturation (consistent with the thermofluor data) or could be due to the presence of multiple disulfide bonds allowing the denatured protein to rapidly and efficiently readopt its native state.
Structural studies
As no crystal structures were previously reported for any fibulin, we then attempted to crystallize the S427-G545 construct. We were able to obtain diffracting crystals and solved the structure in the P21 space group at 2.2 Å resolution. The final model included 8 protein copies (A-H) and 108 water molecules in the asymmetric unit. A pseudo-translational symmetry was detected in the crystal form leading to the relatively high R factors at the end of the refinement; these being 24.28% (Rwork) and 29.74% (Rfree) (supplementary Table S1). However, the electron density maps were well defined for most of the protein chains (supplementary Fig. S4a). The structure included four dimers (AB, CD, EF and GH) in the asymmetric unit, with each dimer in a local two-fold symmetry. Pseudo-translational NCS symmetry was found between dimers AB, CD and EF, GH, respectively. The Cα trace of all the 8 copies in the asymmetric unit were very similar and superimposed with each other with r.m.s.d. values less than 1 Å. The overall structure of each chain is predominantly α-helical, which is consistent with the CD data (Fig. 4a). There are four alpha helices (T428-D445 for α1, D460-E488 for α2, L504-A520 for α3, Y533-E544 for α4) which all run in the same direction (Fig. 5). The loop structures between α1 and α2 (N446-S459) and between α2 and α3 (G499-S503) were incompletely modelled in all 8 chains indicating the flexible nature of these regions. The complete N-terminal His-tag with the initiating methionine was visible in chains A and E. The r.m.s.d. value for Cα atoms and for all atoms were 1.7 and 2.9, respectively, when chain A of the crystal structure was compared with the corresponding region of the AlphaFold2 (alphafold.ebi.ac.uk) model of mouse fibulin-2.
The crystal structure shows a compact one-domain alpha helical structure rather than the previously suggested three separate domains each having anaphylatoxin-like motif [4]. The structure has a disulfide bond architecture which stabilizes the three anaphylatoxin modules (Fig. 5b and Fig. 6b). The first anaphylatoxin-like fold includes the α1-helix (including C435 and C436), α1-α2 loop (including C449), and the N-terminal half of the long α2-helix (including C462, C469 and C470) and the fold is stabilized by three disulfides C435-C462, C436-C469 and C449-C470. These disulfide bonds are referred to as 1-SS1, 1-SS2 and 1-SS3, respectively, from now on. The second anaphylatoxin-like fold includes the C-terminal half of α2-helix (including C479), α2-α3 loop (including C492) and N-terminal half of α3-helix (including C508 and C509). This fold is stabilized by two disulfides (C479-C508 and C492-C509 referred to as 2-SS2 and 2-SS3, respectively). The third anaphylatoxin-like fold consists of the C-terminal half of the α3-helix (including C511 and C512), α3-α4 loop (including C525) and the complete α4-helix (including C535, C542 and C543), and is again stabilized by three disulfides namely C511-C535 (3-SS1), C512-C542 (3-SS2) and C525-C543 (3-SS3). In total, 8 intra-molecular disulfide bonds are found in the domain. Most of them are clearly defined by the electron density (such as 1-SS3 (C436-C469) shown in supplementary Fig. S4a) and they have the same conformation in all 8 copies of the protein in the asymmetric unit.
Structural alignment analysis using the DALI sever [31] does not show any homologous structure with a significant Z-score value. A structural comparison between the solved crystal structure with previously solved structures of wild-type anaphylatoxin domain, human C3a (PDB code 4HW5) [29] and murine C5a (PDB code 4P3A) [30], was done. Both structures have a similar fold, having four α-helices and three conserved disulfide bonds (Fig. 6a and Fig. S5a) which help in stabilizing the protein. These crystal structures, 4HW5 and 4P3A, also have similar disulfide bond arrangement having the 1st, 2nd cysteine residues located on α2-helix, the 3rd cysteine on α3-helix and the cysteine residues which form disulfide bonds with them being all located on α4-helix, as shown in Fig. 6a and Fig. S5. In contrast, the fibulin-2 anaphylatoxin-like modules lack the α1-helix which is found in the anaphylatoxin domain structures (Fig. 6a). Additionally, two α-helices, namely α2 and α3, in fibulin-2 are part of two different anaphylatoxin modules (Fig. 6b). All of this suggests that in fibulin-2, the three anaphylatoxin modules are all embedded into a single-domain four-helix structure forming a dimer with covalent bonding via C500.
The inter-molecular disulfide bond between the two chains between C500 cannot be confirmed reliably with this crystal structure. Cys500 locates in the flexible α2-α3 loop, which is only partly visible in the electron density maps. In the current crystal structure, Cys500 has been modelled in chains B and F. In these chains, electron density is clear for the C500-L504 of the α2-α3 loop. However, the N-terminal region of this loop, A494-T499, in those two chains has only weak density and it was not possible to reliably build this region (supplementary Fig. S4b). The presence of an inter disulfide between the dimer is still possible, because the electron density maps show that something is covalently attached to C500 in chains B and F (supplementary Fig. S4b). When combined with the non-reducing SDS-PAGE (Fig. 2b) and MS (Table 2), which both indicate that all of the protein is in a disulfide linked homodimer, the crystal structure data implies that the inter-subunit disulfide is formed, but that it is not stabilizing the flexible nature of the α2-α3 loop.
To examine potential dimerization of the full-length protein, we calculated dimeric models of the complete fibulin-2 molecule using AlphaFold multimer [32]. This modelling resulted in five different proposed dimeric structures. The one with the highest probability score is shown supplementary Fig. S6. Interestingly, all five models show a common dimerization formation with anaphylatoxin-like modules at the interface between monomers. Furthermore, the formed anaphylatoxin-like region dimer is very similar with the crystal structure presented in this study. This further suggests that the anaphylatoxin-like module region is a key player in the dimerization of the fibulin-2 protein. Interestingly, none of the models showed head-to-tail dimerization as predicted previously [10].