Structural insights into spike protein and its natural variants of SARS-CoV-2 found on Mexican population

Yudibeth Sixto-López Laboratorio de Diseño y Desarrollo de Nuevos Fármacos e Innovación Biotecnológica (Laboratory for the Design and Development of New Drugs and Biotechnological Innovation), Sección de Estudios de Posgrado e Investigación, Escuela Superior de Medicina, Instituto Politécnico Nacional, Plan de San Luis y Salvador Díaz Mirón s/n, Casco de Santo Tomás, Ciudad de México 11340, México https://orcid.org/0000-0001-8903-3834 Martiniano Bello Laboratorio de Diseño y Desarrollo de Nuevos Fármacos e Innovación Biotecnológica (Laboratory for the Design and Development of New Drugs and Biotechnological Innovation), Sección de Estudios de Posgrado e Investigación, Escuela Superior de Medicina, Instituto Politécnico Nacional, Plan de San Luis y Salvador Díaz Mirón s/n, Casco de Santo Tomás, Ciudad de México 11340, México https://orcid.org/0000-0002-9686-0755 José Correa-Basurto Laboratorio de Diseño y Desarrollo de Nuevos Fármacos e Innovación Biotecnológica (Laboratory for the Design and Development of New Drugs and Biotechnological Innovation), Sección de Estudios de Posgrado e Investigación, Escuela Superior de Medicina, Instituto Politécnico Nacional, Plan de San Luis y Salvador Díaz Mirón s/n, Casco de Santo Tomás, Ciudad de México 11340, México https://orcid.org/0000-0002-4973-5265 Jose Antonio Garzón-Tiznado 2Laboratorio de Bioinformática y simulación molecular, Facultad de Ciencias Químico Biológicas, Universidad Autónoma de Sinaloa, Culiacán Sinaloa México https://orcid.org/0000-0003-0423-3349 Sarita Montaño (  mmontano@uas.edu.mx ) Laboratorio de Bioinformática y simulación molecular, Facultad de Ciencias Químico Biológicas, Universidad Autónoma de Sinaloa, Culiacán Sinaloa México https://orcid.org/0000-0002-0420-4268


Introduction
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a newly emerged coronavirus responsible for COVID-19; it becomes a pandemic since March 2020. To date, more than 18,902,735 million people have been infected and killed more than 709,511 as of August 6, 2020. The coronavirus entry into the host cell is mediated by the transmembrane spike (S) glycoprotein. Homotrimers of S proteins are surface-exposed and are responsible for the attachment to the host receptor, which turns it into the main target of neutralizing antibodies 1,2 . The fusion capacity of the S protein is a leading indicator of the infectivity of the virus. S protein consists of two functional subunits, receptor-binding (S 1 ) and fusion the viral and cellular membrane (S 2 ) 2,3 . S 1 comprises the receptor-binding domain (RBD) that recognizes angiotensin-converting enzyme-2 (ACE2) as its receptor 4,5 , S 1 contributes to the stabilization of the prefusion state of the membrane-anchored S 2 subunit, that contains the fusion machinery. The S1/S2 processing sites exhibit different motifs among coronaviruses 6 . The priming process is ensured by different host cell proteases depending on the S1/S2 sequence cleavage site. In the case of the SARS-CoV-2, S protein contains a canonical furin-like cleavage site: 681-PRRAR↓SV-688, which is absent in SARS-CoV and others SARS-related coronaviruses 2,3,6,7 , and it may provide a gain of function for e cient spreading in the human population compared to other lineages of beta-coronaviruses 6 .
Recent studies show SARS-CoV-2 is mutating during the continuous transmissions among the population 8,9 . To date, the Mexican population is circulating two of three viral lineages reported 10 . The principal mutations detected worldwide in the SARS-CoV-2 are on the viral S protein 9,[11][12][13][14] , which is critical for the virus attachment to the host cell 11 . However, the mutation effects on the S protein are poorly understood and possible impact at 3D structural or functional level. These variations can be used to identi es epitopes as a target for vaccine development, as well as antibody and antiviral drugs design, to nd treatments against SARS-COV-2 9,14 .
Insights into the sequences mutations of S protein among available genomes on the GISAID database, we found three principal variations, H49Y, D614G, and T573I in the Mexican population. The 3D crystal structures of S protein are available at PDB database 2,[15][16][17] . However, to capture conformational changes in the 3D atomistic model, Molecular Dynamic simulation (MD) needs to be performed 15 . To determine if the variants H49Y, D614G, and T573I on S protein founded on the Mexican population, could affect the 3D structure, here we employed molecular dynamics simulations (MD) for a full-length atomistic 3D structure of S protein and its variants. Molecular docking calculations with the most populated cluster conformations of the variants were done with Cepharanthine, Hydroxychloroquine, and Nel navir, reported as S protein inhibitors 27,28,29 , in order to explore how the binding mode or free energy of binding was affected due to possible structural modi cations because of punctual mutations on S protein. The results provide evidence of affectations of both binding free energy and binding mode of the compounds with the S protein WT and its variants (H49Y, D614G, and T573I) supported by in silico studies that can be further investigated with experiments.

Results And Discussion
Multiple sequence alignments of the SARS-CoV-2 S proteins show three variants present on Mexican population (H49Y, T573I, and D614G), the rest of the residues are conserved (Fig. 1). These variants above mentioned are far from the RBD domain of S protein.
D614G change is caused by an A/G nucleotide mutation at position 23,403 in the reference strain 32 , this variant is being associated with higher viral loads and enhances viral effectivity in patients 8,32,33 despite the higher viral loads, G614 is neutralizing similarly as D614 by polyclonal antibody 32 . To date, this variant becomes the dominant form displacing the WT, according to the levels of mutation in the world, presented on the nextstrain database (www.nextstrain.org). H49Y variant is produced by C/T change at position 21,707 10 . The properties of residues H/Y changes from positive to neutral charge 9 , this variation causes a reduction in the total free energy, while the mutants with D614G substitutions showed stabilizing structure, suggesting a prevalence role in S protein evolution 13 . Finally, regarding T573I mutants, until now, there is no information available, but the change by T/I implies a modi cation from polar residue to non-polar hydrophobic residue changing the chemical environment to more hydrophobic site. Even though these are punctual modi cations due to the chemical nature of the substitution, modi cation at the structural level is expected.

Molecular dynamics simulations
The mutations on the 3D structures of S were done using PDB: 6VSB crystal on PyMol (Fig. 2), and proceed to develop the Molecular dynamic simulation with Amber. RMSD was calculated to determine the average deviation in the atomic stability under MD simulations. All RMSD trajectories reached the equilibrium after 30 ns of MD simulation (Fig. 3A). The Rg values for WT showed expansion at the last 30 ns of MD with values from 47.5 to 50 Å. D614G and T573I show compactness since 20 ns of trajectory with values from 49.5 to 44.8 Å and 50.4 to 43.79 Å, respectively. H49Y presents compactness from 48.47 to 43.9 Å in the whole simulation (Fig. 3B). Rg values, suggests that variants have a similar degree of compactness while the WT shows an opposite behavior. We explored protein exibility by RMSF values of the Cα through MD simulations of the S proteins and its variants (Fig. 3C)

Clustering analysis
Representative ensembles were calculated from the last 30 ns of the MD simulation. 70% of the most populated conformations were grouped into the rst 10, 13, 15, and 13 clusters for WT, D614G, H49Y, and T573I, respectively (Table 1), this clustering dispersion indicates that proteins possess a complex structural behavior. Although, it was observed that WT protein depicted lesser clustering dispersion than mutated protein, which pointed out that mutated spike is more complexed in their structural behavior.
Only the most populated cluster conformation was retrieved from each one of the systems, and it is represented in Figure 4. The most populated cluster conformation of the WT showed more difference regarding the native structure (RMSD = 9.625 Å, Fig. 4A), D614G (RMSD = 9.998 Å, Fig.4B) showed a higher RMSD than WT while, the other mutated proteins depicted lesser difference in decreasing order as follow, H49Y (RMSD = 6.965 Å, Fig. 4C) >T573I (RMSD = 6.858 Å, Fig. 4D). Speci cally, WT structural difference was observed in most of the structure, and little differences were observed at 270-318 that belongs to N-terminal domain (NTD) and 531-700, that belongs to S1; however, D614G mutant depicted difference in this region which indicate that a single residue mutation affected the stability of this "central portion of the WT protein. In both cases, WT and D614G difference into secondary structures were observed as well as displacements in loops, α-helices, and β-sheets structures. While in H49Y variant, not greater structural changes were observed in comparison to WT, thus it seems that H49Y induces lesser structural changes that are in agreement with RMSD and RMSF observations, where punctual uctuations were observed. These observations are in agreement with in silico predictions of energetic stability induce by mainly by H49Y mutations and in lesser grade by D614G 13 . In T573I variant, uctuation from 535-580 was observed reduced, therefore T573I mutation seems to induce structural stability into WT protein as well as in the zone around where the mutation is found. conformational mobility for D614G and T563I than WT (Fig. 5 C, D, and F). In comparison, H49Y showed a more compact cluster distribution than the other systems, which suggest a reduction in conformational mobility due to single residue mutation (Fig 5D). T563I mutant showed dissimilarly conformed distribution along the subspace in comparison to the others systems, this behavior suggest that the trajectories sampled different regions of the phase space with different minima but small energy barrier, in this sense T563I mutant affects the structural behavior of the protein with is also re ected into the conformations observed during MD simulation (Fig 5F). These results can be corroborated from the analysis of the graphical representation of the full mobility along PC1 and PC2 ( Figure 6), which allows us to study the direction and magnitude of the motions that contribute to the total system's mobility. According to this projection, the movement on WT was uniformed, and by regions, they showed the same direction, i.e., RBD region and S1 and S2 (Fig. 6 A). However, for D614G the direction of the movements change mainly on S2 region and in NTD segment (Fig. 6 B), which is in agreement with the cluster distribution of PC1 vs PC2 projection (Fig. 5D), this modi cation not only increases the exibility of the hinge but also it changes the direction which might contribute to the increase virulency of this mutant because of might alter the binding to ACE protein 13 . Previously, it was found that D614 formed a hydrogen bond with T859 and a salt bridge with K854 placed on the subunit two of the other protomer, thus the change of D by G might provide a exible space between the monomers and more conformational exibility as our MD simulations suggest which might lead to improved accessibility to ACE2, which are consistent with more cell entry of the virus that carries this mutation 8,34 . This motion perturbation is explained because D614 residue is located on a loop exposed side chain, has a negative charge, and that will be a loss on the variant G614, this causes loss of interactions with other molecules or residues. The G614 has a neutral charge, is smaller, and introduces a more hydrophobic residue at this position, and this can result in loss of hydrogen bonds and/or disturb correct folding. Glycines are very exible and can disturb the required rigidity of the protein at this position 13 . In the case of H49Y mutant, the whole mobility decreased, which was re ected in the magnitude of the porcupine representation in Figure 6C, besides the direction of the movement was completely altered in comparison to WT. It was signi cantly decreased on S2 region (816-1106), suggesting that this mutant not only confers more structural stability as we found but also energetic stability as it was suggested by other authors 13 . Aside, it was found that the H49Y mutant virus has increased cell entry in comparison to WT S protein 8 .
Finally, on the T563I mutant at S1 region, an increment on the magnitude of the movements was observed, while on S2 region, the direction of the movements was on the contrary direction to that observed on WT (Fig 6D). So, this mutation altered both magnitudes a direction of the mobility of the protein. In this case, until now, there is not much more evidence about the behavior of the virus that possesses this mutation, so it should be explored to correlate with our ndings.

Molecular docking with known ligands
Finally, to explore how these punctual mutations affect the protein-ligand binding, a molecular docking with Cepharenthine, Hydroxychloroquine, and Nel navir which are experimental reported spike ligands were performed [29][30][31] Cepharanthine showed potent antiviral activity In vitro against SARS-CoV-2, and by molecular docking, spike protein was suggested as its target. It is suggested that cepharanthine is bound to RBD, and might interfere with human ACE2 binding avoiding the anchorage of the virus to the cell 30,35 . In wild type complex, cepharanthine formed a hydrogen bond with E484 and S494 and hydrophobic interactions with L455, F456, Y489, F490, L492, and Q493 which are residues involved in the Interactions between viral spike and ACE2 (7A & D) 36 . Cepharanthine was bound to WT spike protein with the highest binding free energy (-6.57 kcal/mol), followed by H49Y ( -6.42 kcal/mol) which also interacted with residues belonging to RBD by only hydrophobic interactions with Y449, N450, Y451, L452, F490, L492, S494 and Q493 ( Fig.   7A and F). Cepharanthine was a little bit displaced regarding wild type-cepharanthine complex, which might be responsible for the binding energy detriment. While D614G and T5631 showed lesser binding free energy (-5.95 and -5.39, respectively). Cepharanthine also interacted with residues of the RBD domain of D614G mutant, by hydrophobic interactions with Y449, N450, Y451, L452, E484, F490, L492, Q493, and S496 (Fig. 7E). While within the T563I mutant, Cepharanthine was moved away from the typical site where the other complexes interacted, and formed hydrogen bond with R509 and hydrophobic interactions with F342, N343, A344, W436, N437, S438, N439, and N440 none of these residues belong to ACE2-RBD interacting residues (Fig. 7G). It seems that mutation on T563I might affect the binding and maybe the biological effect of cepharanthine as anti-SARS-COV-2 agent 37 .
Hydroxychloroquine is also a compound with effective in vitro inhibition activity against SARS-CoV-2 infection, besides it elevates the endosomal pH and affects the ACE-2 terminal glycosylation avoiding the virus binding 38 . Two independent computational work suggested that hydroxychloroquine might interact with the RBD domain of spike protein, through it can disrupt the viral-host recognition to avoid viral infection 29,30 . Therefore, in this study, we selected hydroxychloroquine to performed docking with WT spike protein and with mutants. Similar behavior with cepharanthine was observed, hydroxychloroquine reaches the binding site on RBD WT spike protein as well as D614G and H49Y, while in T563I mutant hydroxychloroquine was moved away from this site, and it was re ected on the detriment of the binding free energy ( Table 2, Fig. 7B). Hydroxychloroquine in wild type formed a hydrogen bond with Y489 and L492, and hydrophobic interactions with Y449, L452, F456, E484, G485, C488, F490, E493 and S494 (Fig.   7H). In the interaction with D614G, hydroxychloroquine formed hydrogens bond with E484, F490, L492, and Q493 and hydrophobic interactions with S494, Y449, L452, L455, and P491 (Fig. 7I). And with H49Y formed hydrogen bonds with S349, Y449, N450, and S494 and hydrophobic interactions with R346, F347, A348, Y351, A352, Y451 and L452 (Fig. 7J). in wild type spike protein and both D614 and H49Y mutant Hydroxichloroquine interacted with residues of the RBD that participate in ACE-RBD recognition 36,37 . At the same time, Hydroxychloroquine in T563I mutant interacted with F342, W436, and R509, and hydrophobic interactions with S438, L441, N343, A344, S373, F374, N437, S438, which are residues that are not involved in ACE2-RBD interaction 36 (Fig. 7K).
Nel navir is currently marketed as an anti-HIV drug, and recently it was demonstrated that inhibit the cell fusion mediated by spike protein of SARS-CoV-2, by molecular docking, it was identi ed that nel navir is placed between fusion peptide and HR1 helix near the S1/S2 cleavage site 31 . Therefore, nel navir was assayed by molecular docking against WT and mutants forms of the spike protein. Nel navir was bound to wild type spike protein with the highest binding free energy (-7.88 kcal/mol) followed by H49Y (-7.52 kcal/mol), D614G (-6.10 kcal/mol) and nally, T563I (-5.38 kcal/mol). Regarding the binding site, all ligands were allocated around HR1, FP region, and S1/S2 cleavage site, but Nel navir reaches different binding sites in every protein studied (Fig. 7C). Nel navir in WT spike protein was placed in the HR1 region and formed hydrogens bond with K310, I312, and Q314, and hydrophobic interactions with L303, E309, G311, Y313, I664, P665 and I666 as previously reported (Fig. 7L) 31 . While Nel navir with D614 mutant was also placed on the HR1 region but interacted with some different residues than in the wild type, formed a hydrogen bond with N317, S596, K947 and Q1010 and hydrophobic interactions with Q314, F592, G593, I1013 (Fig. 7M). Whereas Nel navir with H49Y only interacts with a couple of residues of HR1 region by hydrophobic interactions (A942 and S943), and with residues of the N terminal domain (S13-L303) by hydrogens bond T302, L303, K304 or hydrophobic interactions V47, L48, Y49, S305, in this case, the displacement outer from the HR domain was more evident than with D614G. Still, these interactions were energetically favorable (Table 2, Fig. 7N). In the case of Nel navir-T563I mutant complex, Nel navir interacted mainly with residues of the HR1 region by hydrogen bonds S943, D950 and D954, and hydrophobic interactions with K947, V951,R1014, it also formed hydrogen bond with K310 and hydrophobic ineraction with P665 (Fig. 7O), eventhoug this binding mode was energetically lesser favorable than those observed with mutants and WT protein.
Therefore, the ligand-protein interaction depicted a similar energy behavior than the observed with cepharanthine and hydroxychloroquine, where WT protein-ligand complex has the highest energy, while T563I-ligand complex has the lowest energy of binding, which could indicate that mutation on T563I position could be relevant for drug-spike protein interaction affecting not only the binding mode but also the binding free energy. were downloaded from the Global Initiative for Sharing All In uenza Data (GISAID) database 16 (www.gisaid.org). The sequence from Wuhan was used as wild type (WT), and it was aligned against twelve S sequences from the Mexican population (Table 3). Multiple alignments were carried out in ClustalX using default parameters, further,it was edited on Jalview 17 . The whole sequence of the spike S protein has 1282 residues, to show only the residues where the protein exhibited the variants on the Mexican population, the sequence without changes were hidden. To obtained the 3D variants of S protein, mutations were done on PyMol 18 , using as template the crystal structure of SARS-CoV-2 spike glycoprotein (PDB: 6VSB). Visualization of wild type and the H49Y, T573I, and D614G S variants were performed on VMD (Fig. 2). Zoom of the structural 3D alignments were done to visualize the region which contains the mutation (Fig. 3). MD Simulation and analysis.
MD simulations were carried out with AMBER 16 software package 19 using ff14SB force eld 20 . The systems were solvated using an explicit TIP3P water model and centered into a rectangular box of 12.0 Å; after that, all the systems were neutralized by adding 6 Na + counterions. Each one of the systems was minimized through 2500 steps of steepest descent, and 2500 steps of conjugate gradients. Then, they were equilibrated through 500 picoseconds (ps) of heating and 500 ps of density equilibration with weak restraints followed by 2 nanoseconds (ns) of constant pressure equilibration at 310 K. 50 ns of MD simulations were performed under periodic boundary conditions and using an NPT ensemble at 310 K.
The electrostatic term was described through the particle mesh Ewald method 21 ; using a 10.0 Å cut-off, similar radio was chosen for Van der Waals interactions. The SHAKE algorithm 22 was used to constrain bond lengths at their equilibrium values, and a time step was set to 2.0 fs. Temperature and pressure were maintained with the weak coupling algorithm 23 using coupling constants τT and τp of 1.0 and 0.2 ps, respectively (310 K, 1 atm). Trajectories were analyzed using cpptraj module for root mean squared deviation (RMSD), root mean squared uctuations (RMSF), the radius of gyration (R g ). Based on the knowledge of equilibration time, clustering analysis (using a 3.0 Å cut-off), was done using a hierarchical agglomerative (bottom-up) approach employing AmberTools 16 Principal component analysis (PCA), also known as essential dynamic (ED) was performed. PCA analysis is a statistical technique that allows to obtain the large scale collective motions of the atoms on the simulations, which are frequently correlated to the biological function and biophysical properties 24 . The performed method is described in detail elsewhere 25   next, the zoom of the aminoacid G614 is shown in pink while the WT residue is indicated in blue. B) 3D structure of H49Y (purple), the zoom of the amino acid variant Y49 is shown in purple, the WT residue is shown in the sticks in blue. C) 3D structure of T573I (cyan) the zoom of the residue variant I573 is shown cyan while the WT residue T573 is indicated in blue sticks. D) 3D structure of wild type S protein is shown in blue. Axes: X, red; Y, green; Z, blue.

Figure 3
Page 16/19 The trajectory of MD Simulations. A)RMSD B) Radius of gyration C) RMSF. The trajectories of the WT are shown in navy blue, D614G are shown in purple, H49Y are shown in blue and T573I are shown in yellow.    (cyan ribbon), H49Y (magenta ribbon) and T573I (yellow ribbon). Also, the interactions are between Cepharanthine (wheat sticks) with D) wild type spike protein, E) D614G, F) H49Y and G) T573I; and Hydroxychloroquine (orange sticks) with H) wild type spike protein, I) D614G, J) H49Y and K) T573I; and Nel navir (raspberry sticks) with L) wild type spike protein, M) D614G, N) H49Y and O) T573I.