3.1 Hotspot mutation analysis
The mutant library designed using Hotspot Wizard 3.0 is presented in Table 1. M40 and G44, and K45, Q48, V89 and V91 were found to be functional hotspots present in the heavy (H) and light (L) chains of the variable domain, respectively. The functional hotspots were not observed to overlap with the stability and correlated hotspots also identified by the server.
All the identified mutation hotspots lie in the structurally necessary FR of the antibody that lines the more variable CDRs. H-M40 and H-G44 lie in the H-FR2, L-K45 and L-Q48 lie in L-FR2 and, L-V89 and L-V91 reside in L-FR3 (Table 1).
Upon annotation of the crystal (PDB ID: 6W41) derived CR3022 antibody structure using PDBePISA; H-M40, L-K45, L-V89 and L-V91 were found to be solvent accessible residues, while H-G44 and L-Q48 were found to be present as partially buried residues at the interface between H and L chains with L-Q48 residue particularly forming an H-bond at the interface (Supplementary File S1) [16].
After generation of a mutant library based on the mutation landscape specified above (section 2.1) and screening of thermodynamically stabilizing and destabilizing mutants in Rosetta module of Hotspot Wizard 3.0, substituent residues; A, H, M, S, W were selected for L-K45; F, W, Y were selected for L-Q48 and I for L-V91 positions, respectively. No stabilizing substituent was identified for the hotspots residing in the H chain within the mutation landscape.
3.2 Antibody modeling and prediction of interacting residues
The Repertoire Builder generates atomic-resolution, three-dimensional models of antibodies from their amino acid sequence by applying a unique multiple sequence alignment extension technique [4]. It is available as an accessible web server and performs better in terms of ease, speed and accuracy of processing and generation compared with its contemporaries such as LYRA, ABodyBuilder, and PigsPro. It also includes scoring of generated CDR loops followed by refinement with Rosetta Antibody. It is an integrated modeling solution specific to antibodies.
Parapred is a freely accessible webserver utilizing a probabilistic sequence-based machine learning algorithm for paratope prediction. Using deep learning architecture, the algorithm leverages features from local residue neighborhoods and across the entire sequence to identify stretches of hypervariable regions in an antibody sequence input. It specifies CDRs as per the Chothia definition for variable heavy and light chains. It requires only the antibody sequence as input, making it independent of antigen information. It has been used to improve the accuracy and speed of rigid docking algorithms and binding pose sampling [5]. The results from Parapred are highlighted in Figure 2 and Supplementary File S2, Appendix 1.
Information on antibody-specific epitopes of the antigen was derived from data generated by Yuan and colleagues and EpiPred [17,18]. EpiPred combines a specific antibody-antigen score with antibody-antigen structures to predict antigen residues involved at the binding interface (epitopes). It utilizes structures of antigen and the antibody as input. It has shown to improve the number of near-native poses upon rigid docking. EpiPred generates three summaries of predicted epitopes. All three summaries along with experimentally determined epitopes of CR3022 were used as inputs for docking to increase sampling of binding poses for the mutant antibodies (Supplementary File S2, Appendix 1).
The choice of these algorithms was employed to affirm optimal integration of both sequence and structure features. This aided greatly in generating information for improving the docking protocol's accuracy and efficiency for finding near-native binding conformations of the designed antibodies.
3.3.1 Structure-guided antigen-antibody docking, refinement and pose generation using HADDOCK 2.4
HADDOCK integrates global and local docking with refinement to generate binding poses. Prior to docking, HADDOCK repairs the input antibody and antigen structures and automatically assigns protonation states to histidine residues using MolProbity. The binding poses generated after global and local docking require refinement to assume near-native conformations.
Short molecular dynamics in explicit solvent (water) are performed for this purpose. It comprises building a solvent shell around the complex, restraining all atoms at the interface to their original position except the side-chain atoms, followed by 1250 MD steps at 300K with position restrained heavy atoms not involved in intermolecular contacts within 5 Å. Following this, the system is cooled down (1000 MD steps at 300, 200 and 100 K) with position restraints on the backbone atoms of the protein complex, excluding the interface atoms. This achieves relaxation of the complex and the comprising structures along with energy refinement for nearing native conformations.
The protocol mentioned in section 2.3 is tailored for antibody-antigen docking. It involves RMSD-based clustering following rigid docking, local docking, and MD refinement to group distinct resultant representative conformations of docked complexes for each system. HADDOCK 2.4 forms clusters by assigning HADDOCK scores to each cluster.
Van der Waals, electrostatic, desolvation, interaction energy components, and buried surface area are considered while computing the HADDOCK scores for each cluster. However, this score doesn’t serve to select a single representative docked complex from the four representatives generated per cluster. The cluster analysis requires closer inspection concerning binding pose and interface.
3.3.2 Clustering analysis and ranking
A unique method was employed to rank complexes among the docking results for each system to prioritize maximal interface between antibody and antigen among the docking results.
For each system, the top four clusters were chosen. The best four representatives of each cluster were selected, i.e., 16 representative complexes for each system were selected for input in protein binding energy (PRODIGY) prediction. PRODIGY is a predictor of binding energy and affinity of protein-protein complexes from their 3D structure. It is based on structural determinants, namely the number of intermolecular contacts at the interface. This parameter, combined with information on non-interacting surfaces’s properties, is used for predicting binding free energy (ΔG) and affinity (Kd). It has demonstrated a Pearson’s Correlation coefficient of 0.73 between the predicted and measured binding affinity on the benchmark (P-value <0.0001) as reported by Xue et al. [19].
ΔG and Kd were predicted at 37°C for each of the 16 representative structures corresponding to every system through PRODIGY. We utilized PRODIGY for relative estimations. Both parameters (ΔG and Kd) were combined into a single entity log (|(ΔG|/Kd) for easier comparison across the 16 structures (Figure 3). This entity combines specificity, affinity and stability displayed by each representative binding pose within a system and can be reliably used for ranking top poses within a system, i.e., for intra-system comparison. The pose with the highest value of log (|(ΔG|/Kd) from each system was selected as a representative conformation (Figure 3).
3.4 Molecular dynamics simulations and MM-GBSA
To analyze docked RBD-antibody complexes’ temporal behaviour and conformations, molecular dynamics simulations were carried out for 100 ns using GROMACS 2020.4 software package. The temperature and enthalpy measures were monitored throughout the production to ensure the maintenance of uniform conditions. MD trajectories for each simulation were analysed using RMSD and root-mean-square fluctuation (RMSF) plots of backbone carbon-alpha (Cα) atoms about the initial conformation of the production run. These frames were analyzed using the GROMOS clustering method based on RMSD distributions across neighbors (with a cut-off set to 2 nm) [20]. For each system, it contained snapshots of centroids of all the clusters. Centroids representing the top 5 clusters in each system were taken for further analysis. They were ranked based on their interface using the method utilizing PRODIGY described in section 3.3. Out of three set of runs, second set of runs were chosen as representative for further analysis (highlighted as red in Figure 6). The top-ranked conformation from among the five models for every system in second set of runs was selected for MM-GBSA analysis.
The HawkDock server performs MM-GBSA calculations based on the GBOBC1 model and the ff02 force field for protein-protein binding free energies [15,21,22]. The result presents five components that compose the total binding energy between the receptor and ligand, namely (i) Van-der Waals energy, (ii) Electrostatic energy, (iii) Generalised Bonn desolvation energy, and (iv) Surface Area (Figure 4). It is worth noting that, the electrostatic energy component is a significant contributor to the total energy for the wild-type complex. Simultaneously, the lead demonstrates a comparatively higher contribution of the Van der Waals energy and a slight increase in the SA contribution to the total energy. This increased dependence of binding upon Van der Waals energy greatly implicates the designed antibody's increased flexibility. Analyzing the binding affinities of all wild-type and mutant antibody candidates set the seal on K45S to be the lead antibody (Figure 5).
The RMSD across the trajectory for WT-PR, V-6W41 and the lead mutant L-K45S indicates convergence of the simulations in the last 20 ns (Figure 6). The RMSF plots for per residue fluctuations in antibody across the lead mutant L-K45S and control WT-PR indicate the presence of more conformationally flexible residues in the mutant (Figure 7). The distinctly higher peak in the mutant RMSF plot (compared to WT-PR) belongs to the second FR of the light chain. This observation implicates the contribution of the single substitution mutation to the flexibility of the designed antibody. The RMSD histogram distributions for the lead mutant (L-K45S) and the two controls (WT-PR and crystal V-6W41) depict a single prevalent conformation representing their respective trajectories (Figure 8).
For affirming effective binding and neutralizing potential of the lead mutant, its end point free energy was also compared with that of SARS-CoV RBD-CR3022 complex (PDB ID: 7JN5). Upon comparison, the binding energy of the mutant (L-K45S) to SARS-CoV-2 was found to be better than that of 7JN5 (Table 2). This emphasises the success of the mutation in supplementing the cross-reactive CR3022 antibody (originally anti-SARS-CoV) in achieving improved binding energy, affinity and possible neutralization potential against the evolved SARS-CoV-2. It is also imperative to note that compared to the binding energy between SARS-CoV-2 S protein RBD and its human target receptor ACE-2 (PDB ID: 6MOJ), the binding energy between the SARS-CoV-2 S protein RBD and the lead mutant antibody (L-K45S) has increased two-fold.
The key residues contributing significantly to the total interaction energy upon binding with SARS-CoV-2 RBD for CR3022 include H-Y27, H-T31, H-Y52, H-Q57, H-G101, H-I102, H-T104, H-D107, L-I39, H-W61 and for K45S include H-M2, H-S100, H-D107, H-I102, and L-N37 (Figure 9). All the key residues reside within the predicted paratopes. Furthermore, it is interesting to note that while K45S has fewer key residues contributing to energy concerning the wild type CR3022, they contribute significant proportions of the energy landscape calculated by MM-GBSA.
The trends of binding affinities computed are in agreement with previously reported studies on the interaction between SARS-CoV-2 RBD and CR3022 [23,24]. The computed binding affinity between SARS-CoV RBD and CR3022 (PDB ID: 7JN5) is higher than that found between SARS-CoV-2 RBD and CR3022. It is imperative to note that the binding affinity between the lead mutant and SARS-CoV-2 RBD is higher than both affirming its status as an affinity matured lead antibody candidate. It is also interesting to note that three other mutant candidates expressed binding affinities greater than that found between ACE2 and SARS-CoV-2 RBD.
3.5 Antigen-antibody interaction analysis
The interfacing residues were analyzed for complexes of SARS-CoV-2 S-protein RBD with lead mutant (L-K45S) and mutants expressing binding energies higher than that of SARS-CoV-2 S-protein RBD and hACE2 complex against the control of wild type CR3022.
3.5.1 CR3022 and K45S mutant
As is evident from the alignment in Figure 2, the epitopes for CR3022 on the SARS-CoV-2 S-protein RBD are distinct from its residues that bind to the human ACE-2 receptor implying a non-competitive neutralization mechanism involving allosteric structural blocking, driven by hydrophobic interactions as also discussed by M. Yuan and colleagues [2]. Interestingly, the lead mutant L-K45S gains additional epitopes on the SARS-CoV-2 S-protein RBD (A411, P412, G413, Q414, F464) (Figure 2). Resembling wild-type, the lead mutant also does not boast of any epitopes overlapping with the human ACE-2 binding residues implying a similar neutralization mechanism and binding based on allosteric blocking through conformational change that does not include competitive inhibition. Moreover, PRODIGY results indicate that the influence of hydrophobic interactions with SARS-CoV-2 reduces upon introduction of the L-K45S mutation in the wild type CR3022.
3.5.2 K45W, K45M and Q48W Mutants
It is important to note how some of the epitopes across all three (L-K45M, L-K45W, and L-Q48W) overlap with SARS-CoV-2 residues binding to human ACE-2. This suggests the possibility of a partial blocking mechanism for neutralization. Moreover, these mutants bind to epitopes on the RBD that coincide with mutant positions in emerging variants of concern, namely, B.1.1.7, P.1, B.1.351, B.1.427, and B.1.429 [25].
The K417 mutation position found in variants P.1 and B.1.351 overlaps with epitopes of mutants L-K45W, L-K45M, and L-Q48W. The L452R mutant position present in variants B.1.427 and B.1.429 also lines the epitopes of all the three aforementioned mutants. Similarly, E484 position in variants B.1.1.7, P.1 and B.1.351 lines the epitopes of all the three concerned mutants. The S494 mutant position unique to the B.1.1.7 variant lies between 2 epitopes of L-K45M and coincides with the epitope of the L-Q48W mutant. The N501 mutant position present in B.1.1.7, P.1, and B.1.351 coincides with the epitope position of L-Q48W and resides in the close vicinity of L-K45M and L-K45W epitopes.
Despite the direct overlap of the epitopes with residues binding to the human ACE-2 receptor and higher binding affinity than SARS-CoV-2 S protein RBD and hACE-2, these three mutants display average interaction energies for the wild type CR3022 and the lead L-K45S. Wu et al. demonstrate that the P384 residue is responsible for determining the affinity between the cross-reactive antibody and SARS-CoV-2 RBD [26]. In SARS-CoV, this position consists of alanine instead of proline and is the reason behind successful neutralization by CR3022, unlike the case with binding of the same antibody with SARS-CoV-2 RBD. Interestingly, this position does not coincide or fall near any of the epitopes of the L-K45M, L-K45W, L-Q48W mutants and can form a plausible explanation for their reduced interaction energy in the context of SARS-CoV-2. Given the observation regarding their epitopes, these three mutants show promise to be developed as neutralizing antibodies against emerging variants of concerns.
The interface and the binding poses of all the discussed mutants compared to wild-type upon interaction with SARS-CoV-2 S-protein RBD are shown in Figure 10.