The pH Effects on SARS-CoV and SARS-CoV-2 Spike Proteins in the Process of Binding to hACE2

COVID-19 has been threatening human health since the late 2019, which has significant impact n human health and economy. Understanding the SARS-CoV-2 and other coronaviruses is important to develop effective treatments for COVID-19 and other coronaviruses-caused diseases. In this work, we applied multi-scale computational approaches to study the electrostatic features of spike (S) proteins for SARS-CoV and SARS-CoV-2. From our results, we found that SARS-CoV and SARS-CoV-2 have similar charge distributions and electrostatic features when binding with the human angiotensin-converting enzyme 2 (hACE2). The energy pH-dependence calculations revealed that the complex structures of hACE2 and the S proteins of SARS-CoV/SARS-CoV-2 are stable at pH values ranging from 7.5 to 9. Molecular dynamics simulations were performed using NAMD to investigate the hydrogen bonds between S proteins and hACE2. From the MD simulations it was found that SARS-CoV-2 has four pairs of essential hydrogen bonds (high occupancy, >80%), while SARS-CoV has three pairs, which indicates the SARS-CoV-2 S protein has relatively more robust binding strategy than SARS-CoV S protein. Four key residues forming essential hydrogen bonds from SARS-CoV-2 are identified, which are potential drug targets for COVID-19 treatments. The findings in this study shed lights on the current and future treatments for COVID-19 and other coronaviruses-caused diseases.


Introduction 35
The ongoing COVID-19 pandemic is changing human society significantly and causing both 36 economic and social consequences all over the world [1]. Coronaviruses are named for their 37 crown-like spikes on their surface, and they are commonly found in many mammal species [2]. 38 Human coronaviruses were firstly identified in the mid-1960s. There are four main sub-39 groupings of coronaviruses, known as alpha, beta, gamma, and delta [3]. Among all the 40 coronaviruses, there are seven known types of coronaviruses that can infect human beings. 41 People around the world commonly get infected by human coronaviruses 229E, NL63, OC43,42 and HKU1 [4,5]. And some coronaviruses that infect animals are able to evolve and infect 43 humans, among which the three recent cases are SARS-CoV-2, . 44 The SARS-CoV-2 virus is the novel coronavirus that causes coronavirus disease 2019, or 45 COVID-19. Other than COVID-19, coronaviruses have caused several pandemics before, 46 including severe acute respiratory syndrome (SARS) which was caused by SARS-CoV and the 47 Middle East respiratory syndrome (MERS) which was caused by MERS-CoV. To end the 48 current pandemic soon and be prepared for the future similar challenges for human society, it is 49 essential to understand the binding mechanisms of SARS-CoV-2 infecting human cells. This is 50 achievable by studying the stabilities of SARS-CoV-2 at different pH conditions, and identify the 51 key residues that play significant roles in the binding processes. 52 53 Coronaviruses contain membrane glycoprotein (M), nucleocapsid protein (N), spike protein (S), 54 envelope protein (E) and an RNA single chain [7]. For all enveloped viruses, one of the most 55 important steps during the binding process is membrane fusion, which allows viruses to get into 56 host cells [8]. For coronaviruses, the fusion protein is the S protein that leads the binding process 57 to attack human cells through the host cell receptor angiotensin-converting enzyme 2 (hACE2) 58 [9]. Human hACE2 (hACE2) is an enzyme located widely in the human body, including the 59 lungs, kidneys, adipose tissue, central nervous system and cardiovascular system [9][10][11] and it 60 has multiple essential functions such as the regulation of amino acid transport in the kidney 61 controlling the blood pressure, and viral receptors including both SARS-CoV-2 and SARS-CoV 62 [11]. Since it is of extreme importance to human health, there are numerous research groups have 63 been or are currently working on S proteins and hACE2 using various approaches. 64 65 The traditional process of the de novo drug design is a challenging task which consumes 66 resources and time significantly. In this work, we first calculated the electrostatic potentials on the surface of S proteins from both 76 SARS-CoV and SARS-CoV-2, followed by the electric field line comparison between SARS-77 CoV and SARS-CoV-2 when they bind to hACE2. We found that the two viruses have similar 78 pH responses: The pH-dependence of folding energies for S protein receptor binding domains 79 (RBDs) demonstrated that both the S protein RBDs of these two viruses are at the most stable 80 status when pH values ranging from 6 to 9. Also, the pH-dependence of binding energies for S 81 protein RBDs and hACE2 RBD showed that the complex structures of the two viruses are at the 82 most stable status at pH values ranging from 7.5 to 10.5. Therefore, SARS-CoV and SARS-CoV-83 2 survive in a similar pH environment. The pH 7.5 to 9 is the best condition for both SARS-CoV 84 and SARS-CoV-2 to best perform their functions to bind with hACE2. Also, we analyzed the 85 trajectories from 100ns MD simulations using NAMD [30] and identified hydrogen bonds with 86 the involved key residues using VMD [31]. It is shown that for the high-frequency (>80%) 87 hydrogen bonds, SARS-CoV-2 has four pairs while SARS-CoV has three pairs, which indicates 88 that the S protein of SARS-CoV-2 uses more residues to form strong hydrogen bonds. The key 89 residues forming essential hydrogen bonds from SARS- where ϕ(r) is the electrostatic potential, ϵ(r) is the dielectric distribution, ρ(r) is the charge 115 density based on the atomic structures, κ is the Debye-Huckel parameter, k B is the Boltzmann 116 constant, and T is the temperature. Due to the irregular shape of macromolecules, DelPhi uses a 117 finite difference (FD) method to solve the PBE. 118 119 Before the DelPhi calculations, the PQR file of each trimer was generated by PDB2PQR [37]. 120 We used AMBER [38]force field for PDB2PQR calculation, and removed water molecules. For 121 the better results, we ensured the new atoms are not rebuilt too close to existing atoms and 122 optimized the hydrogen bonding network. 123 124 During DelPhi calculations, the resolution was set as 0.5 grids/Å. The dielectric constants were 125 set as 2.0 for protein and 80.0 for the water environment, respectively. The pH value for the 126 solvent environment was set to be 7.0. The probe radius for generating the molecular surface was 127 1.4 Å. Salt concentration was set as 0.15 M. The boundary condition for the Poisson Boltzmann 128 equation was set as a dipolar boundary condition. The calculated electrostatic potential on the 129 surface was visualized with Chimera (figure 2). VMD was used to illustrate electric field lines 130 between S protein and hACE2 (figure 3). Finally, the color scale range was set to be from -1.0 to 131 1.0 kT/e for the best visual presentation. Besides the calculations of electrostatic potentials, we 132 also used DelphiForce [39] to calculate the electrostatic binding forces between each S protein 133 and hACE2 while separating them in the direction of the mass center connection line (figure S2). 134 Besides the net forces between each S protein and hACE2, the X, Y, Z components of the net 135 forces are also calculated and shown in figure S2. 136 137

Relative Folding Energy Calculation 138
We used DelPhiPKa [40,41] to calculate pKa values of DNA and UDG, given the pH ranging 139 from 0 to 14 with the pH interval of 0.5. During the calculations, we used AMBER force field, 140 and removed water molecules and HETATM. For the hydrogen of ASP and GLU attached atom, 141 we used OD1 and OE1, respectively. Variance of Gaussian Distribution was set to be 0.7, salt 142 concentration was 0.15, reference dielectric was 8.0, and external dielectric was 80.0. 143

144
The net charges of proteins at the unfolded state were calculated using this equation: 145 where the summation is of all the titratable groups, y(i) value is -1 for acidic groups and +1 for 147 basic groups, respectively. As for the folding free energy, we used this equation: 148 where ( ) and ( ) stand for the net charge of folded and unfolded state, respectively. 150 R is the universal gas constant taken as 1.9872 × 10 −3 * . T is the temperature with the 151 value of 300 K. 152 Please note that the algorithms we applied to calculate the folding energies are for the relative 153 values, that is, at pH=0 the folding energy is 0 and at any other pH values the folding energies 154 are the relative values to the pH=0 condition. 155 156

Relative Binding Energy Calculation 157
For the binding energy calculation, we involved two methods, which are DelPhiPKa and 158 MM/PBSA [42]. To calculate binding energy using DelPhiPKa, the following equation was used: 159 minimization was performed for each simulation, followed by a 100 million steps, during which 173 20,000 frames were saved from two 100ns simulations of both SARS-CoV and SARS-CoV-2 174 separately (1.0 fs per step, 1 frame at each 5000 steps, 100 million steps in total). The RMSDs of 175 the SARS-CoV and SARS-CoV-2 trajectories are about 3.4Å and 1.1 Å, respectively (figure S1). 176 During the MD simulations, we used CHARMM [43] force field, the temperature was set to be 177 300 K, and the pressure was set to be standard using the Langevin dynamics. For PME, which is 178 set for full-system periodic electrostatics, with the grid size (86, 88, 132) as (x, y, z) value 179 respectively. In those two simulations, atoms that are not located in binding domains were 180 constrained within a margin of 10.0 Å of their natural movement maximum length values. In 181 order to get a more accurate result of the simulation, data of the last 50 ns of simulations were 182 selected and used for data analysis, since the structure of the first 50 ns is not as stable as the last 183 50 ns of simulations. The simulation processes are visualized in movies 4 and 5, generated by 184 VMD. 185 186 To analyze the interaction between S proteins and hACE2, the hydrogen bonds that formed 187 within the distance of 4 Å were extracted from the last 10,000 frames (50 ns

Electrostatic Potential on Surfaces 219
To study the electrostatic features, DelPhi was utilized to calculate the electrostatic potential on 220 surfaces of the S protein trimer (full structure) and hACE2 RBD. The electrostatic potential 221 distribution on SARS-CoV S protein trimer structure is showed in figure 2BEH and movie 1, 222 which were rendered by Chimera with a color scale from -1.0 to 1.0 kT/e. The charge 223 distribution on SARS-CoV-2 S protein trimer structure is shown in figure 2CFI and movie 2, 224 which were rendered by Chimera with a color scale from -1.0 to 1.0 kT/e as well, for the 225 comparison. Negatively and positively charged areas are colored in red and blue, respectively. 226 227 By comparing the electrostatic potential on surfaces of two trimer structures, it is obvious that The electric field lines demonstrate that when hACE2 is away from S protein, all the three S 259 protein monomers provide attractive interactions to the hACE2. This is expected because the S 260 protein RBDs are positively charged while the hACE2 is negatively charged, as shown in figure  261 2 and movie 3, respectively. When hACE2 binds to S proteins (as shown in figure 1), the hACE2 262 only binds with one S protein RBD, which is in open state. Combining the information from 263 figure 1 and 3, it demonstrates that all the three S protein RBDs generate attractive forces to 264 hACE2. However, when hACE2 gets closer to S protein, one S protein RBD flips out and binds 265 to the hACE2 tightly, while the other two S protein RBDs stay in closed state. Even though the 266 monomer with flipped-out S protein RBD is the closest to hACE2 and forms most of the salt 267 bridges and hydrogen bonds, the other two monomers also provide dense field lines and show 268 strong attractive interactions between S proteins and hACE2.

pH-Dependence of Relative Folding Energies 281
The folding energy of SARS-CoV and SARS-CoV-2 complexes were calculated using 282 DelPhiPKa at different pH values ranging from 0 to 14 with an interval of 0.5 (Figure 4). We 283 observed that SARS-CoV and SARS-CoV-2 have the same trend of folding energy with the 284 change of pH values, which is decreasing from 0 to 6, then becoming stable from 6 to 9, and 285 increasing from 10 to 14. Other than the trend, the optimal values locate between 6 to 9 for both 286 of the viruses. 287 Please note that the folding energies in figure 4 are relative values because we set the reference 288 energy to be 0 kcal/mol when pH is equal to 0. We did not calculate the absolute values of 289 folding energies since we focused on the pH dependency of the folding energies. 290 291 292 Figure 4. pH-dependence of the relative folding energy of S protein RBDs of SARS-CoV and 293 SARS-CoV-2. 294

pH-Dependence of Relative Binding Energies 296
DelPhiPKa was implemented to calculate the binding energies of two complex structures at 297 different pH values. The results are presented in figure 5, where we noticed that the binding free 298 energies of both SARS-CoV and SARS-CoV-2 complexes are stable at the pH values ranging from 299 7.5 to 10.5, which indicates that both SARS-CoV and SARS-CoV-2 have a slight preference of 300 weakly basic environment. Note that the method implementing DelPhiPKa calculates the relative 301 folding and binding energies rather than absolute energies. The folding/binding energy at pH 0 is 302 set as reference, which is 0 kcal/mol. The relative energy profile is used to study the 303 folding/binding energy dependence on pHs. The absolute binding energies was calculated in later 304 section using MM/PBSA method. Combine the folding and binding energy profiles, it is concluded 305 that the best pH environment for both the SARS-CoV and SARS-CoV-2 is from pH 7.5 to 9. Please 306 note that the binding energies in figure 5 are relative values because we set the reference energy 307 to be 0 kJ/mol when pH is equal to 0. We did not calculate the absolute values of binding energies 308 since we focused on the pH dependency of the binding stability. 309 310 311 Figure 5. The relative binding energies of complexes at different pH values. 312

Hydrogen Bonds Analysis 314
To analyze the hydrogen bonds distributions on both S proteins RBDs and hACE2 RBD, we 315 colored the residues forming hydrogen bonds which are over 50% frequency during the MD 316 simulations in figure 6. It's obvious that the SARS-CoV S protein has more residues involved in 317 the hydrogen bonds which are over 50%. Accordingly, the hACE also has more residues forming 318 hydrogen bonds (over 50% frequency) with SARS-CoV S protein. 319 320 In order to consider the most essential hydrogen bonds, which are the hydrogen bonds with 321 relatively high frequencies, we took 80% as a cutoff, which means those hydrogen bonds with 322 80% or higher frequency are considered as the relatively more essential ones. By comparing the 323 figure 7A and 7B, SARS-CoV-2 RBD forms one more essential hydrogen bonds than SARS-324 CoV RBD when binding to hACE2. The residues involved in forming hydrogen bonds over 50% 325 frequency were colored with their side chains, in which the residues with over 80% frequency 326 hydrogen bonds were labeled and highlighted in grey squares ( figure 8CF). From the analyses of 327 figure 6-8, it is revealed that SARS-CoV uses more hydrogen bonds to bind with hACE2. 328 However, more high frequency hydrogen bonds are formed in the SARS-CoV-2/hACE2 329 complex. The key residues forming essential hydrogen bonds from SARS-CoV-2 are: ARG-121, 330 TYR103, THR182 and TYR171. Such residues have significant contributions to the binding of 331 SARS-CoV-2 and hACE2. Therefore, these residues have higher potential to be targets for future 332 drug design. 333 334 335 336 Labelled key residues that form essential hydrogen bonds (frequency over 80%) at the interface; 356 (D) SARS-CoV-2 S protein single chain binds to hACE2; (E) A closeup view of (D) at the 357 binding interface; (F) Labelled key residues that form essential hydrogen bonds (frequency over 358 80%) at the interface. 359 360

4
Limitation 361 The limitation for this work is that we used relative folding energy and binding energy to analyze 362 rather than the absolute values. Since our work is focused on the relative stability under the pH 363 effects, the relative energy calculations do not affect our conclusions. 364 365

5
Conclusion 366 In this work, we applied several computational methods, including MD simulations, DelPhi, 367 DelPhiForce and DelPhiPKa to study the electrostatic features of S proteins for SARS-CoV and 368 SARS-CoV-2. From our results, SARS-CoV and SARS-CoV-2 S protein RBDs both have 369 positively charged interfaces, which provides attractive interactions to hACE2 as hACE2 has 370 negatively charged surface. 371 372 Also, we revealed the pH-dependence calculations of relative folding energy for SARS-CoV and 373 SARS-CoV-2 S protein RBDs. The best pH to stabilize SARS-CoV and SARS-CoV-2 S protein 374 RBDs is in the range of 6 to 9. The study on pH dependence of binding energies revealed that the 375 complex structures of hACE2 and S proteins of SARS-CoV/ SARS-CoV-2 are stable from pH 376 7.5 to 10.5. Therefore, SARS-CoV and SARS-CoV-2 survive in a similar pH environment. The 377 pH 7.5 to 9 is the best condition for both SARS-CoV and SARS-CoV-2 to best perform their 378 functions to bind with hACE2. 379 380 Besides, based on 100ns MD simulations, we found that for the essential hydrogen bonds (>80% 381 frequency), SARS-CoV-2 has four pairs while SARS-CoV has three pairs, which indicates the 382 relatively more robust binding strategy of SARS-CoV-2 compared to SARS-CoV. The key 383 residues forming essential hydrogen bonds from SARS-CoV-2 are ARG-121, TYR103, THR182 384 and TYR171, which are potential drug targets for COVID-19 treatments. By using multiple 385 computational approaches, the findings in this work shed light on the current and future 386 treatments of COVID-19 and other coronaviruses-caused diseases. 387 388 389