Weak interactions of COVID-19 virus M pro and peptide-like inhibitor N3: revealing the role of non-conventional hydrogen bonds by theoretical methods

Two models consisting of 469 atoms each one, for the interaction between the N3 inhibitor and the main protease of SARS-CoV (SARS-CoV M pro ) and SARS-CoV-2 (SARS-CoV-2 M pro ) viruses, were used to reveal by quantum chemistry methods the non-covalent interactions involved in these systems. Through the Density Functional Theory and the Quantum Theory of Atoms in Molecules the main conclusion reached by this study indicates that C-H ··· O and C-H ··· H hydrogen bonds are crucial to describe attractive interactions in these complexes. In general, these contacts are overlooked in many studies. However, these non-conventional hydrogen bonds represent more than a half of the energy interaction estimated for non-covalent contacts. These results are quite important for the design of new drugs since these interactions could drive the action mechanisms against these viruses. Hydrogen bonds are crucial to describe correctly the interactions between inhibitors and the main proteases.


Introduction
The beginning of a new decade, in many cultures, represents a positive and extremely exciting event. The year 2020 should have been a very stimulating year for our diverse and multicultural society, around the world, due to a wide range of extremely anticipated and prestigious activities (e.g., the Olympics in Tokyo 2020). However, 4 months into 2020 and our planet is sadly facing the greatest test since World War II: an existential global health crisis through the outbreak of a novel coronavirus respiratory disease 2019 (COVID-19) caused by SARS-CoV-2 virus. An interesting contribution by Menachery et al. 1 published in 2015, almost prophetically, warned us about "bat coronaviruses shows potential for human emergence". The suggestion that COVID-19 may have originated in Chinese bat populations will lead to antagonic, scientific and political, conclusions. However, regardless where COVID-19 is actually coming from, this pandemic has been taking human lives worldwide. In addition, our global economy is transiting into a huge recession as businesses close and millions of jobs are lost (as a direct result of globalization and interconnected international economies). Governments around the world have reacted fragmentary and in unambiguously different ways. Extensive spread of COVID-19 in each continent, this virus has a higher transmission rate than SARS-CoV which emerged in 2002, 2 is now demonstrating how serious this treat is to life. Thus, believing that this pandemic can be solved by national governments only operating individually, it is frankly a daring unreality. Global leaders, not only G20 leaders, must rise a strong and common battle-front to defeat this pandemic. This of course, is not an exclusive task for the international governments. Among the different strategic actions (coming from different sector of society) to achieve such interactive collaborative role, the scientific community must rapidly cooperate across borders to try to understand the virus and develop new methods to combat it. Therefore, it is of vital relevance that clinicians, epidemiologist and basic researchers (i.e., via the implementation of computational resources) all work together to obtain, analyze and interpret meaningful data related to COVID-19. Most importantly, these scientific contributions must be available as soon as possible to provide valuable insights on the collective fight against COVID-19, perhaps the greatest threat in human history.
Family Coronaviridae 3 is constructed from enveloped viruses with a positive-strand RNA genome. These coronaviruses mainly contain four structural proteins: spike protein, envelope protein, membrane protein, and nucleocapsid protein 4 . Spike proteins are relevant during the virus infection since they promote host attachment and virus-cell membrane fusion. SARS-CoV-2 is a coronavirus that belongs to this family. Several compounds have been investigated in order to find antiviral drugs or a vaccine against human COVID-19 infection 5 , but an effective and unquestionable antiviral strategy against this emergency is not yet available. One possible target is to find an inhibitor of the SARS-CoV-2 M pro and for this reason there are several investigations in this direction. SARS-CoV-2 is a new coronavirus but there are others. In particular SARS-CoV is a well-studied coronavirus of which have been reported inhibitors that might help us to control the infections. In 2016, Wang et al. reported the crystal structure of a synthetic peptidomimetic inhibitor named N3 forming a complex with SARS-CoV M pro3 . From the comparison with different proteases of other coronaviruses, they demonstrated that the M pro is a conserved drug target in coronaviruses 3 . Yang et al. 6 reported that all CoV main proteases have a highly conserved substrate-recognition pocket. The X-ray crystal structure of SARS-CoV-2 M pro was published in February 2020 5 . It has a dimer-like structure that present 96% sequence homology with that of SARS-CoV M pro7, 8 . This is certainly important to the design of novel inhibitors since this similarity allows us to think that it is possible to develop inhibitors against SARS-CoV-2 M pro using the information that we already have for SARS-CoV M pro . Peptide-like inhibitor N3 ( Figure 1) has been investigated with SARS-CoV M pro and also with SARS-CoV-2 M pro . The crystal structure of N3 with SARS-CoV-2 M pro is already published 5 . Authors report with detail the specific interactions of N3 with the main protease and they compared with the N3-SARS-CoV M pro complex. Authors conclude that inhibitors bind to the main proteases in a similar mode and therefore, the action mechanisms might be similar. However, these are X-Ray structural data and with these results it is not possible to give a full characterization of molecular interactions. To understand the action mechanism it is important to have accurate information on the inhibitor-protein interactions. One of the efforts to have this information is the fragment molecular orbital-based interaction analysis that has been reported previously; the authors analyzed the complex formed with the SARS-CoV-2 M pro and the peptide-like inhibitor N3. This description is partial as they used fragment molecular orbitals 9 . It is crucial to have accurate information of the interaction of SARS-CoV-2 M pro and the N3 inhibitor and for this reason, in this investigation we determine and compare all the interactions, including weak interactions, of N3-SARS-CoV M pro and N3-SARS-CoV-2 M pro complexes. Hydrogen bonds, that are crucial for the description of the interactions and activity of these compounds, are well characterized in this work using the Quantum Theory of Atoms in Molecules (QTAIM) 10 approach. From this theoretical analysis, we determine the differences between N3-SARS-CoV M pro and N3-SARS-CoV-2 M pro complexes. These results considerably will contribute to the development of effective drugs against COVID-19.

Results and discussion
In this article, two models were used to describe the interaction between N3 and the main proteases of SARS-CoV and SARS-CoV-2. Both models were obtained from the X-ray structures (PDB ID's 2AMQ and 6Lu7, respectively) 5, 6 and contain 469 atoms, 97 related to the inhibitor and the remained related to the corresponding main protease. More details of the used methodology are explained in the methods section .

2/7
The N3 inhibitor is a molecule that exhibits different rotamers. Figure 1 reports three of them. Rotamer (I) corresponds to the structure observed in the crystallographic structure of the N3-SARS-CoV-2 M pro complex; for this case, only hydrogen atoms positions were relaxed. The other two structures in Figure 1 (II and III) were obtained with our computational methods by relaxing completely all atoms positions. These three structures are reported in the electronic supporting information (ESI). We note that structures II and III are quite different from each other, but the energy difference between them is quite small (2.4 kcal/mol) and therefore it is possible to say that this inhibitor has associated conformers with similar energy. One important characteristic of this molecule is the number of lone pairs given by oxygen and nitrogen atoms involved in its structure, which give a particular feature of the Molecular Electrostatic Potential (MEP), as can be seen on the right side of Figure 1. Therefore, this molecule provides several positive and negative regions of the MEP that induce possible contacts with coronaviruses M pro that are driven by electrostatic effects. By analyzing both complexes, N3-SARS-CoV M pro and N3-COVID-19 M pro , the non-covalent interaction index (NCI) 11 reveals weak non-covalent interactions for both molecular systems, as we can see from Figure 2. In this figure, the NCI is represented in terms of an isosurface colored mainly by green color, which indicates weak interactions (van der Waals and hydrogen bonds) between N3 and the main protease, but also intramolecular interactions. Such an isosurface forms a boundary between the electron densities delivered by our protease models and the N3 inhibitor. Without a doubt there are many non-covalent interactions between these systems. From results delivered by the NCI, we do search all possible critical points of the electron density to characterize these 3/7 contacts through the QTAIM approach. Thus, we performed this search of both complexes and we counted each interaction between N3 and both proteases models by looking for all bond critical points (BCP) defined within the QTAIM. From these points we found the corresponding bond paths to establish the atoms involved in each contact and with this information we determine the molecular graph of each modeled complex. We would like to emphasize that this task for systems with the number of atoms considered in this article is a remarkable challenge for common software used currently. For this reason we used the implementations proposed in the GPUAM 12 project to obtain responses in reasonable times. In Figures 2b and 2d we present the bond paths (in pink color) for both complexes considered in this article. From here, we characterize each contact between the N3 and SARS-CoV-2 M pro , and SARS-CoV M pro . Table 1 is a summary of the QTAIM analysis performed to obtain the hydrogen bonds related to the interaction between N3 and M pro from SARS-CoV and SARS-CoV-2. Table 1. Hydrogen bonds obtained for N3-SARS-CoV M pro and N3-SARS-CoV-2 M pro models. d(A· · ·H) represents the distance between a hydrogen atom (H) and its acceptor (A), d(A· · ·D) the distance between acceptor and donor (D), and A· · ·H-D the angle involved in a hydrogen bond. The electron density evaluated at a critical point corresponding to a hydrogen bond is represented by ρ(BCP).

Hydrogen bond N3-SARS-CoV M pro N3-SARS-CoV-2 M pro # d(A-H) d(A-D) A-H-D ρ(BCP) # d(A-H) d(A-D)
A-H-D ρ(BCP) The QTAIM approach delivers an impressive number of hydrogen bonds in N3-SARS-CoV M pro and N3-SARS-CoV-2 M pro models, which are recorder in Table 1. In this table there are several properties that characterize the hydrogen bonds: i) distance between hydrogen atom (H) and its acceptor (A), ii) distance between acceptor and donor (D), iii) angle A· · ·H-D defined by the three atoms involved in a hydrogen bond, iv) electron density evaluated at the BCP, ρ(BCP), corresponding to the hydrogen bond. For each property we are reporting the lowest value (top), the average (middle and bold font) and the highest value (bottom); more specific information of hydrogen bonds of these systems is given in ESI. It is important to mention that we have considered only those contacts that exhibit ρ(BCP) ≥ 0.001 atomic units (au). From the data reported in this table, clearly N-H· · ·O, C-H· · ·O and C-H· · ·H hydrogen bonds have an important presence in these complexes. N-H· · ·O hydrogen bonds exhibit geometrical parameters which induce appropriately this interaction, with relatively small A· · ·H and A· · ·D distances, and high A· · ·H-D angle. It is worth to mention that this hydrogen bond seems stronger in the N3-SARS-CoV-2 M pro than that observed for N3-SARS-CoV M pro , by the values presented in Table 1. It is also important to mention that there are eight of these contacts in the N3-SARS-CoV-2 M pro complex and seven in the N3-SARS-CoV M pro .
The C-H· · ·O contact has been recognized as a weak hydrogen bond, with high relevance to stabilize some systems 13 . In the two complexes considered in this article, it is clear the importance of these interactions since there are 25 contacts in N3-SARS-CoV M pro and 27 in N3-SARS-CoV-2 M pro . However, we cannot overlook the role of the C-H· · ·H contact (dihydrogen bond 14 ) since it appears several times in each complex; in fact, from Table 1 we observe more favorable geometrical parameters in this bond than those found for C-H· · ·O interactions. C-H· · ·N and C-H· · ·S hydrogen bonds exhibit similar electron density at the BCP than those observed for C-H· · ·O and C-H· · ·H interactions, although they have less occurrence in both complexes. C-H· · ·C, N-H· · ·H, N-H· · ·N appear only in the N3-SARS-CoV M pro complex, but they are irrelevant in the stabilization of this complex. For non-covalent interactions we use the Espinosa-Molins-Lecomte (EML) approach 15 to estimate the energy involved in each hydrogen bond. This approximation is quite useful to compare binding energies between two similar systems. From this analysis we found a total hydrogen bond energy of -162.1 kcal/mol for the N3-SARS-CoV M pro complex and -120.9 kcal/mol for the N3-SARS-CoV-2 M pro . Thus, the energetic contribution of these interactions is 34.1% higher in the complex of N3 with M pro from SARS-CoV than that observed with SARS-CoV-2. It is important to mention that this estimation of the hydrogen bond energy is useful only to compare both systems and it is not intended to estimate thermodynamic binding properties.
In terms of this energy analysis, the contribution (in percent) of each hydrogen bond is reported in Table 2. From the EML approximation, N-H· · ·O, C-H· · ·O and C-H· · ·H hydrogen bonds contain the major percent of the total energy. For the N3-SARS-CoV M pro complex, the C-H· · ·O contact contributes with more than a half of the total energy; the C-H· · ·H contact has also an important contribution to the total energy. From this analysis we conclude that C-H· · ·O and C-H· · ·H hydrogen bonds contribute with an important percent to the energy contribution of non-covalent interactions. If we consider only the N-H· · ·O hydrogen bond as main contribution to the stabilization of these systems by the side of non-covalent interactions then we are making a big mistake, since most of the energy stabilization is coming from the other hydrogen bonds. Table 2. Energy contribution, in percent, of each hydrogen bond type found in N3-SARS-CoV M pro and N3-SARS-CoV-2 M pro complexes.

Modeled complex
Hydrogen bond type N-H· · ·O C-H· · ·O C-H· · ·H C-H· · ·C C-H· · ·N C-H· · ·S N-H· · ·H N-H· · ·N The positions of hydrogen bonds predicted by our computational methodology is an important contribution since we comprehensively described all the hydrogen bonds involved in the inhibitor-protease interaction. Other methodologies to describe inhibitor-protease interactions are based on molecular mechanics that used parametrizations for the description of the hydrogen bonds. We are stressing that in N3-SARS-CoV M pro and N3-SARS-CoV-2 M pro complexes, there are many non-conventional hydrogen bonds that must be considered in such parametrization; as a matter of fact, C-H· · ·O unconventional hydrogen bonds contribute as much or more than conventional N-H· · ·O hydrogen bonds, as can be seen in Table 2. How important are the hydrogen bonds found in this study? We have designed two models to study the relevance of hydrogen bonds for the N3-SARS-CoV-2 M pro complex. First, we built two adducts: 1) N3-CYS144, and 2) A model with 230 atoms from the N3-SARS-CoV-2 M pro complex. In both cases there is the C-S covalent bond involved in the N3 and the main protease. Both structures were fully optimized (reported in ESI) and measured the two C-S bond lengths where the sulfur atom is involved. For the N3-CYS144 adduct we obtained 1.832 Å and 1.818 Å, and for the second model 1.835 Å and 1.809 Å. For the N3-SARS-CoV-2 M pro complex, from the single crystal X-ray diffraction information, 1.765 Å and 1.802 Å are the reported bond lengths. Thus, non-covalent interactions have a crucial impact on bond length of covalent bonds; in particular, we have focused on the C-S bond, which links N3 with the main protease that is an important bond in these systems. Naturally, there are other effects that affect C-S bond lengths (protein packing, for example). However, from here it is clear that non-covalent interactions impact also in the description of covalent bonds. Therefore, it is important to conclude that the description of all hydrogen bonds involved in the inhibitor-protease interaction (conventional and non-conventional) is necessary to fully understand the possible action mechanism of these drugs. the systems are the contacts residues of the SARS-CoV M pro or SARS-CoV-2 M pro respect to N3 molecule. Then, for the N3-SARS-CoV M pro and N3-SARS-CoV-2 M pro models, a cutoff radius of 4.0 Å was applied. That is, we consider any M pro residue with at least one atom up to a distance of 4.0 Å respect to any atom of N3. Each of these two systems contains 469 atoms. The cavities provided by the M pro models are depicted through the molecular electrostatic potential in Figure 3. In addition, we have considered one system containing the N3 inhibitor covalently bounded to the CYS144 (111 atoms), and other system considering a cutoff radius of 4.0 Å but using the PJE residue of the N3 inhibitor as a reference (230 atoms). These two systems were also built from the N3-SARS-CoV-2 M pro complex. In all generated systems, the different peptide fragments were capped at the N-and C-termini in the neutral states or methylated. All these structures are included in the ESI. The electronic structure of all systems was obtained by using the PBE0-D3/6-311G(d,p) 16,17 method with the Terachem code 18 . The molecular electrostatic potential 19 , NCI and QTAIM analysis were obtained by the graphics processing units for atoms and molecules (GPUAM) project 12,20 , developed in our group.