Dimer-monomer equilibrium of SARS-CoV-2 main protease as affected by small molecule inhibitors: a biophysical investigation

The maturation of coronavirus SARS-CoV-2, which is the etiological agent at the origin of the COVID-19 pandemic, requires a main protease M pro to cleave the virus-encoded polyproteins. Despite a wealth of experimental information already available, there is wide disagreement about the M pro monomer-dimer equilibrium dissociation constant. Since the functional unit of M pro is a homodimer, the detailed knowledge of the thermodynamics of this equilibrium is a key piece of information for possible therapeutic intervention, with small molecules interfering with dimerization being potential broad-spectrum antiviral drug leads. In the present study, we exploit small angle x-ray scattering (SAXS) to investigate the structural features of the SARS-CoV-2 M pro monomer-dimer equilibrium, by revealing the corresponding equilibrium dissociation constant and the associated thermodynamic parameters. SAXS is also used to study how the M pro dissociation process is affected by small inhibitors selected through combinatorial design. Our results show that a clear picture connecting the ability of inhibitors to disrupt the M pro dimerization with the loss of catalytic activity cannot be provided, thus highlighting the possible role of allosteric effects for the regulation of M pro functionality. Our experimental


Introduction
The COVID-19 pandemic is the ongoing worldwide health emergency caused by the coronavirus SARS-CoV-2 (severe acute respiratory syndrome-cororavirus-2). 1, 2 Coronaviruses (CoVs) are enveloped positive-stranded RNA viruses; once the virion gets into the cell, the single-strand RNA translates into two overlapping polyproteins, termed pp1a and pp1ab, which mediate viral replication and proliferation. The virus maturation involves a highly complex cascade of proteolytic processing events on these polyproteins: most cleavage events are ruled by a nonstructural protein, the CoV main protease (M pro , also known as 3CL pro ), a three-domain (domains I to III) protein. 3 The enzyme shows first autolytic cleavage from pp1a and pp1ab, then starts processing the two polyproteins at no less than 11 conserved sites. 3 Because of this mechanism of action, inhibiting M pro might lead to an attenuation of the viral infection. Indeed, this enzyme is a very attractive target for anti-CoV drug design: the M pro sequence is highly conserved among various CoVs, 4 as mutations of M pro turn out to be often fatal for the virus. 5 Thus, the risk of mutation-mediated drug resistance is very low and inhibitors will display broad-spectrum antiviral activity. In addition, M pro inhibitors are unlikely to be toxic because human proteases have different cleavage specificity. A second point should be however considered: the published X-ray structures of SARS-CoV-2 M pro , obtained both in the presence and in the absence of inhibitors, 6,7 revealed that two M pro monomers form a functional active homodimer, as already detected in different coronaviruses, 3 which share with SARS-CoV M pro almost all the amino-acids involved in the dimerization. In such homodimer, the two monomers are arranged almost perpendicular to each other 7 and each monomer comprises the catalytic dyad His41-Cys145 and the substrate-binding site located in a cleft between domains I and II. Domain III, which contains five α-helices arranged into a globular cluster, is directly involved in

Results
As the dimerization process is a pivotal key for the biological activation of the SARS-CoV-2 M pro , several therapeutic strategies against COVID-19 are based on inhibitors acting also (or only) at the M pro dimerization interface. 6,12,13 In order to evaluate the effects of potential inhibitors targeting both the catalytic and dimer dissociation, we have first derived the thermodynamics parameters controlling the M pro dimer-monomer equilibrium in solution by SAXS and CD spectroscopy techniques. Subsequently, we have investigated by SAXS experiments the M pro dimerization in the presence of a series of potential inhibitors, selected from an in-house database containing commercial and synthetic compounds. Activity assays were also performed to correlate the M pro activity to dimerization inhibition.

M pro dimerization and thermal stability
The dimer-monomer equilibrium of SARS-CoV-2 M pro has been investigated at different protein concentrations by performing in-solution SAXS experiments in the temperature range between 15 • and 45 • C and far-UV CD measurements at room temperature. Far-UV CD spectroscopy was also used to study the M pro thermal stability, monitoring the unfolding transition between 10 • and 80 • C.

SAXS
SAXS data of SARS-CoV-2 M pro recorded at the B21 beam-line of the Diamond Synchrotron (Didctot, UK) at different protein concentrations and temperatures are shown as log-log plots in Fig. 1, top panels. We have assumed that SAXS curves arise from a system of interacting M pro monomers and dimers, according to the thermodynamic equilibrium dissociation process given by the relationship: The corresponding equilibrium dissociation constant is where C is the total molar concentration of monomers, x 1 is the molar fraction of proteins that remain in the monomeric state, ∆G D is the dissociation Gibbs free energy change, R is the universal gas constant and T the absolute temperature. To note, equation 2 can be solved in terms of x 1 ,   Figure 1. SAXS data and fits. Top panels: SAXS experimental data of SARS-CoV-2 M pro without inhibitors and best theoretical fits obtained by GENFIT software 14 (solid black and white lines). Each panel reports a dataset obtained at the same temperature. Bottom panels: SAXS data of M pro with inhibitors at different concentrations and temperatures. Each panel reports curves at the same temperature. Red, green, blue, orange, dark-green, cyan and magenta refers to inhibitor 1, 2, 3, 4, 5, 6 and 7, respectively. Thin and thick lines refer to inhibitor concentrations of 30 and 60 µM, respectively. Subsequent curves are multiplied by a factor 3.0 for clarity. Solid black and white lines are the best fits obtained by GENFIT.
According to classical thermodynamics, the temperature dependence of ∆G D is where ∆G • D = −RT • log K • D is the dissociation Gibbs free energy at the reference temperature T • = 298.15 K (K • D being the associated equilibrium constant), ∆C p D is the change of the constant pressure heat capacity upon dissociation (here supposed independent on T ) and ∆S • D is the dissociation entropy at T • . The macroscopic differential scattering cross section, which is the experimental information provided by a SAXS curve, for a system of interacting monomers and dimers can be written as N A being Avogadro's number, κ an unknown fraction of the nominal protein molar concentration C N (C = κC N ), B an arbitrary flat background that takes into account possible uncertainties in the determination of transmissions of proteins and buffers samples. P(q) represents the average form factor of the system where P j (q) stands for the form factor (which is the orientational average of the excess squared X-ray scattering amplitude) of the M pro monomer ( j = 1) or dimer ( j = 2). We have calculated P j (q) from the the crystal structure of SARS-CoV-2 M pro dimer recently determined 7 (PDB code 6y2e) considering one chain ( j = 1) or both chains ( j = 2) by means of the SASMOL method. 15 This method takes into account the contribution to the scattering due to the hydration water molecules around the protein, whose positions are found by embedding the atomic structure in a tetrahedral close packed lattice. For SARS-CoV-2 M pro monomer and dimer, 726 and 1243 hydration water molecules have been respectively calculated, suggesting that for the dimer formation about 200 water molecules are removed from the hydration shell of both monomers. Hence, the water molecules attributed to each monomer decrease from 726 to 621 upon M pro dimerization. This suggests that the dimerization process is accompanied with slight structural changes reducing the average area accessible to solvent. The S M (q) term in equation 5 is the so-called "measured" structure factor, which describes the long range intermolecular interactions among all the particles in solution. For a sake of simplicity, here we consider a unique effective structure factor that takes into account monomer-monomer, monomer-dimer and dimer-dimer interactions. Considering that at low q all the experimental scattering curves ( Fig. 1 top panels) show a positive deviation from a Guinier trend, indicative of the prevalence of protein-protein attraction with respect to repulsion, we have approximated the structure factor by the one of fractal distribution of inhomogeneities developed by Teixeira, 16 whose main parameters are D, the fractal dimension of the aggregates, r 0 , the effective radius of the aggregating protein molecule and ξ , the correlation length, which can be interpreted as the average size of the aggregates (see equations 10, 11 and 12). The above described model, which combines SARS-CoV-2 M pro thermodynamic and structural features, has been adopted to simultaneously analyze the whole set of the SAXS curves shown in Fig. 1, top panels. Fitting parameters shared by all the curves are K • D , the dissociation equilibrium constant at T • , ∆C p D , the constant pressure heat capacity upon dissociation, ∆S • D and the dissociation entropy at T • . Another parameter considered unique for all the curves is the relative mass density of the hydration water (in general higher than 1), d h , which is taken into account in the SASMOL method. 15 The common fitted parameters are shown in Table 1, while all the other fitted parameters are reported in Supplementary Table S1. Table 1. Thermodynamic parameters resulting from the global fit of SAXS data for SARS-CoV-2 M pro without inhibitors at different temperatures and concentrations.
The most important parameter obtained by the simultaneous fit of SAXS data is the dissociation constant K • D , which resulted equal to 7 ± 1 µM, in good agreement with the value obtained by Graziano et al. 17 on the very similar main protease from SARS-CoV. The corresponding dissociation Gibbs free energy is ∆G • D 30 kJ mol −1 , a value quite similar to the one observed for the β -lactoglobulin dimer dissociation at neutral pH. 18 Regarding the dissociation entropy, we have obtained a positive value, 50 ± 20 J mol −1 K −1 , but meaningfully smaller in respect to the one observed in the above mentioned β -lactoglobulin case. 18 It should be noticed that in a dissociation process, many factors other than translational and rotational motions contribute to a positive dissociation entropy and it is difficult to separate them. One such factor is, without doubts, the removal of about 200 hydration water molecules from the monomer-monomer interface when the dimer is formed. The change of the heat capacity at constant pressure upon dissociation resulted positive and large. This parameter indirectly describes the monomer-monomer interface, as it can be attributed to the hydration and correlates with the interface size. 19 When dissociation heat capacities are positive and large, temperature meaningfully increases monomer-monomer affinities. This is our case: the M pro large dissociation heat capacity might be directly correlated with the SARS-CoV-2 infective efficiency as a function of temperature. However, a further investigation on the monomer-monomer interface area and its relationship with the dissociation heat capacity 20 requires further calorimetric experiments in order to obtain lower estimation errors. Finally, the relative density of the hydration shell is slightly larger than one, in agreement with previous literature results on globular proteins. [21][22][23] The determination of the thermodynamic features of the dimer-monomer equilibrium of M pro in conditions quite similar to those found in vivo is a fundamental step to investigate the effects of drugs aimed to inhibit dimerization and underlines the importance to further investigate M pro monomer-monomer interface by in-solution techniques.

Far-UV CD
To provide further insights on the dimer-monomer equilibrium, we have measured the far-UV CD spectra of M pro at three different concentrations, as shown in Fig. 2, left panel.
At the higher concentration of 16 µM, the ellipticity shows a minimum wavelength λ min at about 221 nm and a shoulder centered at about 208 nm, which are typical of proteins with α-helical and β -sheet content, 24,25 fully consistent with the structural features of the SARS-CoV-2 M pro , 7 in agreement with CD measurements of the same enzyme 26 and of the very much similar SARS-CoV M pro . 27 As concentration decreases, λ min shifts towards lower values, thus reporting an increase of the β -sheet component at the expense of the α-helical content. 24 Such an effect is related to the fact that M pro monomer and dimer have different secondary structure components. The λ min trend can be described in terms of the dimer-monomer equilibrium through the following expression, where we fixed K D = 7 ± 1 µM as estimated by SAXS, while λ min mon and λ min dim are the minimum wavelength parameters corresponding to the monomer and the dimer spectra, respectively. As shown in the inset of Fig. 2 (left panel), the trend of the λ min values is fitted in an excellent way with equation 7.
The thermal stability of the M pro has been characterized by monitoring the signal at 221 nm of the M pro sample at 16 µM concentration, which mainly consists of dimers. The rather sharp transition we have obtained is shown in Fig. 2 (right panel) and clearly suggests a two-state model, where the dimer unfolds and yields two random-coil monomeric chains: Considering the scheme 8, if we hypothesize that the dimer can unfold to two random-coil monomeric chains, we obtain an apparent melting temperature of 323 K, with a melting Van't Hoff enthalpy of ∆H v = 810 ± 60 kJ/mol. This value is in good agreement with the Van't Hoff enthalpy ∆H v ∼ 880 kJ/mol estimated through the equation ∆H v = 4RT 2 m C P,max /∆H cal from DSC measurements. 26 It is also worth of note that, by taking ∆H cal = 443 kJ/mol, 26 it turns out a ratio ∆H v /∆H cal ∼ 1.8: such a value larger than 1 is fully consistent with the unfolding transition coupled to the dimer dissociation.

In-silico inhibitor selection
To identify new inhibitors of SARS-CoV-2 main protease from a large in house database, we applied the in silico protocol, recently proposed by some of us. 28 The flowchart of the adopted protocol is depicted in Supplementary Fig. S1. As a first step, we performed molecular docking studies on the compounds present in the database to analyze their binding capability in the catalytic active site of the SARS-CoV-2 M pro (PDB code 6y2f), 7 as detailed in the Materials and Methods section. Supplementary Fig. S2 shows the 3D binding active site of SARS-CoV-2 M pro co-crystallized with the native inhibitor 13b 7 covalently bonded to Cys145. The ligand binds to the enzymatic catalytic cleft of the protease located between domains I and II. The 3D binding site representation ( Supplementary Fig. S2) highlights the interactions with the amino acid residues involved in the inhibition mechanism, such as Met49, Met165, Glu166, His164, Phe140, Gly143 and the catalytic Cys145. It is noteworthy the presence of hydrogen bonds between the pyridone moiety of ligand and Glu166, which rules the catalytic activity driving the SARS-CoV-2 M pro to adopt an inactive conformation. The resulting best docked molecules have been selected based on a docking score cut-off of −6.5 kcal/mol and submitted to ligand based approaches, by taking advantage of the web-service DRUDIT (DRUgs Discovery Tools), an open access virtual screening platform recently developed, 29 which represents the evolution of previous well-established protocols based on molecular descriptors. 30,31 DRUDIT implements the ligand based template of SARS-CoV-2 M pro , available in the Biotarget Finder tool, which has been recently proposed as a useful mean in the identification of new SARS-CoV-2 M pro modulators. Subsequently, the ligands 5/16 selected by molecular docking were submitted to DRUDIT, as elsewhere reported, 28 allowing the evaluation of their affinity to SARS-CoV-2 M pro by the values of Drudit Affinity Score (DAS). The features of the ligand-based approaches based on molecular descriptors allowed the evaluation of topological, thermodynamic and charge-related characteristics of the ligands. Thus, two complementary standpoints in the evaluation of the binding capability (ligand-and structure-based) covered all the interaction aspects in the ligand-target complex. The top scored molecules (selected based on a DAS cut-off of 0.65) were processed by Induced Fit Docking (IFD) calculations to further screen the hits to submit to in-wet test. In Supplementary  Fig. S3 and in Table 2 the seven best scored structures are reported. Fig. 3 reports the first two best scored molecules 3 and 7 Inhibitor Prime_Energy XPG_score IFD_score The selected inhibitors have been tested for their efficacy to reduce the M pro activity. As reported in Fig. 4, the time dependence of substrate fluorescence after hydrolysis indicates that the catalytic activity of M pro in the presence of the selected compounds changes. In particular, compounds 2, 4, 5 and 7 induced an irreversible inactivation of the enzyme, while compounds 1 and 6 resulted rather inactive. For two of the most effective compounds (2 and 7) inhibition tests have been carried out as a function of the concentration. Unfortunately, we have not been able to perform this test for compound 4, which shows the best inhibition efficacy, as it produces a fluorescence signal that partially obscures that of the substrate. Results are shown in Fig. 5 (left panel). Percent inhibition data have been fitted with the Hill equation, p(C I ) = 100/(1 + (IC 50 /C I ) n ), to get the half maximal effective concentration, IC 50 , and the Hill slope n. We obtained IC 50 = 10.3 ± 0.2 µM for 2 and 15 ± 2 µM for 7, with n = 5 ± 1 and 3 ± 1, respectively. These values on n larger than one indicate that the binding is positively cooperative, in agreement with other recent experimental results. 32

SAXS
SAXS curves of SARS-CoV-2 M pro samples obtained in the presence of the seven selected potential inhibitors at two concentrations and at different temperatures are reported in Fig. 1, bottom panels. as log-log plots. SAXS data have been analysed with the same approach adopted for data without inhibitors, with the further assumption that, for each compound, the thermodynamic parameters are linear functions of its concentration C I , namely  Table 1), and the three corresponding constant rates α G , α C p and α S are fitting parameters common to all the SAXS curves corresponding to the same inhibitor. The high quality of the fitting procedure can be appreciated in Fig. 1 (bottom panels), where the calculated SAXS curves are superposed to the experimental curves and the resulting thermodynamic common fitting parameters are shown in Table 3, first panel. The inhibitors with the lowest values of α G (Table 3, first panel) are those that mostly favour dimer dissociation. Results reported in Table 3 suggest that compounds 1, 6, and 7 are, within the experimental error, mostly able to increase the dissociation equilibrium constant, which at C I = 30 µM becomes as large as ≈ 15 µM and, at C I = 60 µM almost doubles its value, reaching ≈ 30 µM. Inhibitor 5 is slightly less active: at C I = 60 µM we found a dissociation equilibrium constant of ≈ 20 µM. The other three compounds, 2, 3 and 4, do not show any statistically significant difference with respect to the results in the absence of inhibitors. Despite the high uncertainties on α C p and α S , their negative values suggest that upon dissociation there are changes of heat capacity and of entropy smaller than those observed without inhibitors, indicating that inhibitors increase the monomer order. Looking to the single curve parameters, reported in Supplementary Table S1 and Supplementary Table S2, we observe that for almost all cases the values of the correlation length ξ are similar for samples with and without inhibitors. The fractal dimension is ≈ 2, suggesting a two-dimensional fractal growth of the protein clusters in presence of inhibitors.

Discussion
It is widely known that the active site of M pro monomer, which is widely conserved among all coronavirus, is typically composed of four subsites, referred to as S1 , S1, S2, and S4. [33][34][35] They accommodate the corresponding domains P1 , P1, P2, and P4 of the substrate or the ones of the inhibiting compound that mimics the substrate. 36 The S1 subsite is constituted by the two residues Thr24 and Thr25. The S1 subsite (also referred to as the S1 pocket 35 ) is formed by the side chains of residues Phe140, Asn142, Glu166, His163 and His172 and by the main chains of Phe140 and Leu141. 36 As discussed by Sacco et al., 35 there are evidences that S1 can interact with both hydrophobic and hydrophilic groups: hence it is considered a promising target for an inhibiting compound. S2 is a hydrophobic subsite formed by the side chains of His41, Met49, Tyr54, Met165 and Asp187. S4 is a small hydrophobic pocked that involves the side chains of Met165, Leu167, Phe185, Gln192 and Gln189. 36 An unusual catalytic dyad, His41-Cys145, acts in the active site, where His41 is a proton acceptor whereas Cys145 is attacked by the carbonyl carbon of the substrate. Hence, a signature of the inhibiting power of a compound is its capability to form a covalent bond with Cys145, 33 as very recently confirmed by Dai et al., 34 who have found two promising inhibitors 11a and 11b. The importance of the protonation state of Cys145 as well as the network of hydrogen bonds between the catalytic site of M pro and inhibiting compounds has also been recently discussed by Kneller et al. 37 by combining X-ray and neutron results.
On these grounds, the experimental results obtained by the present study, together with the structure of the seven inhibitors within the M pro active site determined by the refined molecular docking, can be discussed.
The interaction map of inhibitor 1 is shown in Supplementary Fig. S4. There are a total of 11 contacts with amino acids of M pro monomer (Ser1, Thr25, Thr26, Ser46, Asn119, Leu141, Asn142, Cys145, Pro168, Arg188, Gln192), 2 of them (Ser46, Asn119) are hydrogen bonds. The residues of the catalytic dyad and the four subsites in contact with 1 are: Cys145 (dyad, 1 of 2 (50%)); Thr25 (S1 , 1 of 2 (50%)); Leu141 and Asn142 (S1, 2 of 6 (33%)); Gln192 (S4, 1 of 5 (20%)). To note, these contacts involve only one of the residues of the catalytic dyad, Cys145, without a hydrogen bond, whereas for inhibitor 13b  there is a hydrogen bond with Cys145 ( Supplementary Fig. S2). It also worth to notice that Ser1 is among the residue in contact with inhibitor 1: since the mutual interaction of Glu166 of one monomer and Ser1 (the N-finger) of the other monomer has been proven to shape the catalytic cleft, 7 we argue that this compound could destabilize the dimer, as suggested by the high value of K • D = 26 ± 4 µM at C I = 60 µM. However, its enzymatic inhibition is very poor, as shown by the high similarity of the RFU slope with the one in the absence of inhibitors (Fig. 4). A possible explanation of this result could be the absence of any hydrogen bond with Cys145 as well as the absence of any contact with the residues of subsite S2.
Compound 6 determines 11 contacts with the amino acid of the monomer (His41, Leu141, Ser144, Cys145, Met165, Glu166, Leu167, Arg188, Gln189, Ala191, Gln192, Supplementary Fig. S8), including two hydrogen bond (His41, Gln192). In particular, the residues of the catalytic dyad and the four subsites in contact with 6 are: His41 and Cys145 (dyad, 2 of 2 (100%)); Leu141 and Glu166 (S1, 2 of 6 (33%)); His41 and Met165 (S2, 2 of 5 (40%)); Met165, Leu167, Gln192 and Gln189 (S4, 4 of 5 (80%)). An almost absent inhibition effect is seen by fluorescence, being the slope of RFU (Fig. 4) very similar to the one determined in the absence of inhibitors. On the other side, compound 6 is able to modify the dimer-monomer dissociation, with one of the highest value of K • D = 30 ± 10 µM at C I = 60 µM (Table 3). To note, only one of the 6 amino acids that stabilize the S1 site are included in the list of residues interacting with compound 6. Hence, the absence of its inhibition activity could be explained by the small size of its molecular structure, which might not be able to provoke important modifications of the S1 pocket and hence to modify the catalytic features of M pro .
We finally turn to compound 7, which reports the best value of IFD_score (Table 2). It shows an opposite behaviour with respect to compound 6: it is capable to change the dimer-monomer equilibrium at the same extent (K • D = 30 ± 10 µM at C I = 60 µM, Table 3) and displays a promising inhibition effect, with m ≈ 1. For this compound, the map of contacts shows 12 interactions (Thr24, Thr25, Thr26, His41, Thr45, Met49, Leu141, His164, Met165, Glu166, Asp187, Arg188, Fig. 3) with a large number of hydrogen bonds (Thr24, Thr26, His41, His164, Glu166, Arg188). The residues of the catalytic dyad and the four subsites in contact with 7 are: His41 (dyad, 1 of 2 (50%)); Thr24 and Thr25 (S1 , 2 of 2 (100%)); Leu141 and Glu166 (S1, 2 of 6 (33%)); His41, Met49, Met165 and Asp187 (S2, 4 of 5 (80%)); Met165 (S4, 1 of 5 (20%)). Only one of the interacting 9/16 residues (Glu166) is involved in the stabilization of the S1 pocket. We can argue that the high inhibition effect could be due to the high number of contact with S2 and to the presence of the 6 hydrogen bonds. An other hypothesis, which needs further insights to be confirmed, is that the fluorinated groups, which are present in a high number in compound 7, may originate a new reactive warhead able to form a covalent bond with Cys145. We may also consider that one of them involves a residue of the catalytic dyad, His41, suggesting a possible important modification of the enzymatic activity.
In summary, our results show that the inhibition effect of compounds designed to bind the catalytic site of SARS-CoV-2 M pro does not necessarily modify the dimer-monomer equilibrium. On the other side, we have also observed that compounds able to provoke dissociation do not always show inhibition effects. To better visualize the scenario presented by our results, we report in Fig. 5 (right panel) the slope m of the fluorescence inhibition curve as a function of the dimer-monomer equilibrium constant K • D obtained at C I = 60 µM for the seven compounds. The points in this map could be organized in two groups, as represented in blue and in red. The red points refer to compounds that show an expected behaviour: the strongest is their capability to induce the dissociation of the M pro dimer, the most important is their inhibition effect. For the blue compounds, the behavior is opposite: the increasing of the dissociation does not determine an increase of the inhibition effect. This apparently contradictory result can be in part explained by considering that, in all cases, the dissociation equilibrium is weak. As clearly shown in Supplementary Table S2 the fraction of monomers x 1 is never larger than ≈ 0.6, even in the case of the most effective among our compounds. This means that, in the presence of a compound that alter the dimer-monomer equilibrium but that does not hamper the interaction with the substrate, there are always dimeric M pro molecules that can exert their enzymatic activity when a substrate is available.
Very recently Suárez et al., 38 through a 2 µs Molecular Dynamics simulation of M pro with and without a model peptide mimicking the enzyme substrate, have shown the importance of the dimerization in stabilizing the catalytic dyad and the overall contribution of protein flexibility in the binding of the protein with the substrate. The experimental work presented here provides a further evidence on the complex interplay between enzymatic activity inhibition and dimer dissociation. Furthermore, at the best of our knowledge, it shows for the first time the contribution of the SAXS technique, combined with advanced data analysis, for obtaining structural information about the SARS-CoV-2 M pro in solution and in the presence of promising inhibitors. Our results suggest that more experimental evidences about the impairment of monomer and dimer M pro in the presence of inhibitors corroborated by computational result will be necessary for a deeper understanding of the M pro allosteric mechanism.

Materials and Methods
M pro expression and western blot analysis pGEX-6P-1 vector harboring the full length cDNA sequence encoding for SARS-CoV-2 Main Protease (M pro NC_045512) was purchased from GenScript (clone ID_M16788F). The expressing vector was transformed into BL21DE3pLys Escherichia

10/16
coli cells and the obtained clones were assayed both in small scale (5 mL) and medium scale (500 mL and 1 L) for the production of SARS-CoV-2 M pro . Transformants were grown onto LB medium containing 100 µg/mL Ampicillin and 34 µg/mL Chloramphenicol as selective antibiotics. Cultures were grown up to OD600 of 0.6-0.8 at 37 • C, 200 rpm and then M pro expression was induced by addition of 0.5 mM isopropyl-1-thio-β -D-galactopyranoside (IPTG). Growth under induction was achieved both for 3 h at 37 • C and 10 h at 16 • C in order to test the best expressing condition. Cells were harvested by centrifugation at 6000 g. Cell pellets were resuspended in lysis buffer (20 mM Tris-HCl pH 8.0, 300 mM NaCl, 2 mM β -mercaptoethanol), and cell rupture was achieved by sonication (Sonics Vibra Cell sonicator) at 4 • C. Cell debris was separated from the total protein extract by centrifugation at 6500 g for 1 h. Supernatant aliquotes were resuspended in Laemmli sample buffer, run onto 12% SDS-PAGE and transferred onto PVDF membrane for Western blot analysis. M pro was decorated by 6×-His tag monoclonal primary antibody (Invitrogen) and anti-mouse secondary antibody and detected by chemiluminescence (Clarity TM Western ICL Substrate, Biorad, Supplementary Fig. S10, panels A and B).

M pro purification and His-tag cleavage
The total cell extract was loaded onto Ni-NTA affinity column (G-Biosciences) and washed by washing buffer (Tris-HCl 20 mM pH 7.6, NaCl 100 mM). M pro was eluted by elution buffer (Tris-HCl 20 mM pH 7.6, NaCl 100 mM, 300 mM imidazole) in 5 fractions of 1 mL each. Aliquotes of elution fractions were loaded onto 12% SDS-PAGE acrylamide gel and imidazole was removed by dialysis against Prescission cleavage buffer (Tris-HCl 50 mM pH 8.0, NaCl 150 mM, DTT 1 mM, EDTA 1 mM) through Amicon Ultra-4 centrifugal filters 30K (Merck Millipore). For M pro C-terminal His-tag removal, the Prescission (1 U for 100 µg of protein) cleavage reaction was performed at 4 • C for 4 h and Prescission protease was then removed by GSTrap FF column (GE-Healthcare). The M pro solution was further purified by FPLC size-exclusion chromatography on Superdex TM 75 10/300 GL column ( Supplementary Fig. S9, panels A and B). 26,36 M pro activity assay and inhibition The fluorescently labelled auto-cleavage sequence of SARS-CoV-2 M pro , Mca-AVLQ↓SGFRK(Dnp)K (purchased from GenScript), was utilized to monitor the recombinant M pro kinetics (excitation 320 nm, emission 405 nm). The assay was started by mixing ≈ 0.2 µM SARS-CoV-2 M pro to different amounts of substrate (10, 20, 40 µM) in order to set the best protein-substrate concentration to detect M pro activity. 33 Fluorescence intensity was measured by DeNovix DS-11 FX+ fluorometer. The M pro activity reported as reference for inhibition tests was obtained by linear fitting of the fluorescence curve in the presence of 40 µM of substrate concentration. 26,33 Seven inhibitors dissolved in DMSO were tested at a final concentration of 30 µM 34 (Fig. 4). Each reaction in a final volume of 200 µL was firstly incubated for 20 min at 30 • C without substrate. After substrate addition, fluorescence intensities were reported as relative fluorescence units (RFU) and monitored every minute for a duration of 30 min at 30 • C.

Circular Dichroism
Circular dichroism measurements were performed using Jasco J810 spectropolarimeter. Quartz cuvettes with path-length of 1 mm was used, in order to obtain the optimum signal-to-noise ratio for the M pro samples with concentrations of 16, 6 and 2.5 µM respectively. Each spectrum was collected in the range from 220 to 325 nm, with a scan speed of 50 nm/min. The thermal stability has been studied at 16 µM M pro concentration, by varying the temperature through a thermal bath from 300 to 350 K. SAXS data analysis approach has been described in the main text, with the exception of some minor points. Since in all conditions the nominal molar protein concentration is lower than 1 mM, its temperature variations can be considered to be only determined by the dependency with T of the relative mass density of water, which, according to literature results 39 is written as

Small Angle X-ray Scattering
where, in our investigated range 15 − 45 • C, the optimum value of the thermal expansivity at T • is α w = 2.5 · 10 −4 K −1 and the one of its first derivative is β w = 9.8 · 10 −6 K −2 . Accordingly, C N = C • d w , C • being the nominal protein concentration at T • .

11/16
The measured structure factor S M (q) has been obtained in relation to the protein-protein structure factor S(q) by: where β (q) is the coupling function and P (1) (q) is the average of the protein excess scattering amplitude, a function provided, together with P(q) by the SASMOL method. According to Ref., 16 S(q) has been written as where Γ(x) is the gamma function, D is the fractal dimension (comprised between 1 and 3) of the aggregates, r 0 is the effective radius of the aggregating protein and ξ is the correlation length.

Ligand Preparation
The default setting of the LigPrep tool implemented in Schrödinger's software (version 2017-1) was used to prepare the ligands for docking. 40 All possible tautomers and combination of stereoisomers were generated for pH 7.0 ± 0.4, using the Epik ionization method. 41 Energy minimization was subsequently performed using the integrated OPLS 2005 force field. 42

Protein Preparation
The crystal structure of SARS-CoV-2 M pro in complex with ligand 13b (PDB code 6y2f) 7 was downloaded from the Protein Data Bank. 43 The cocrystal ligand, covalently bonded to Cys145, was treated by breaking the covalent bond and filling in open valence. Protein Preparation Wizard of Schrödinger software was subsequently employed for further preparations of the protein structure using the default settings. 44 Bond orders were assigned, and hydrogen atoms as well as protonation of the heteroatom states were added using the Epik-tool (with the pH set at biologically relevant values, i.e. at 7.0 ± 0.4). The H-bond network was then optimized. The structure was subjected to a restrained energy minimization step (RMSD of the atom displacement for terminating the minimization was 0.3 Å), using the Optimized Potentials for Liquid Simulations (OPLS) 2005 force field. 42

Docking Validation
Molecular Docking was performed by the Glide program. 30,45,46 The receptor grid preparation was performed by assigning the original ligand (13b) as the centroid of the grid box. The generated 3D conformers were docked into the receptor model using the Standard Precision (XP) mode as the scoring function. A total of 5 poses per ligand conformer were included in the post-docking minimization step, and a maximum of 2 docking poses were generated for each ligand conformer. The proposed docking procedure was validated by the re-dock of the crystallized 13b within the receptor-binding pockets of 6y2f by Glide covalent docking. The results obtained were in good agreement of the experimental poses, showing a RMSD of 0.75 Å.

Biotarget Finder module (DRUDIT)
The refined selection of suitable SARS-CoV-2 M pro inhibitors was performed through the module Biotarget Finder as available in the www.drudit.com webserver. 29 The tool allows to predict the binding affinity of candidate molecules versus the selected biological target. The template of the biological target was built as previously reported. Thus, the in house database was submitted to the Biological Predictor module by setting the DRUDIT parameters, N, Z, and G, using the crystallized structure of 13b, as previously reported. 28

Induced Fit Docking
Induced fit docking simulation was performed using the IFD application as available 31,47 in the Schrödinger software suite, 48 which has been demonstrated to be an accurate and robust method to account for both ligand and receptor flexibility. 49 The IFD protocol was performed as follows: 50, 51 the ligands were docked into the rigid receptor models with scaled down van der Waals (vdW) radii. The Glide Standard Precision (XP) mode was used for the docking and 20 ligand poses were retained for protein structural refinements. The docking boxes were defined to include all amino acid residues within the dimensions of 25 Å×25 Å×25 Å from the centre of the original ligands. The induced-fit protein-ligand complexes were generated using Prime software. 52,53 The 20 structures from the previous step were submitted to side chain and backbone refinements. All residues with at least one atom located within 5.0 Å of each corresponding ligand pose were included in the refinement by Prime. All the poses generated were then hierarchically classified, refined and further minimized into the active site grid before being finally scored using the proprietary GlideScore function defined as follows: XPG_score = 12/16 0.065 vdW + 0.130 Coul + Lipo + Hbond + Metal + BuryP + RotB + Site, where vdW is the van der Waals energy term, Coul is the Coulomb energy, Lipo is a lipophilic contact term that rewards favourable hydrophobic interactions, Hbond is an H-bonding term, Metal is a metal-binding term (where applicable), BuryP is a penalty term applied to buried polar groups, RotB is a penalty for freezing rotatable bonds and Site is a term used to describe favourable polar interactions in the active site. Finally, IFD_score (IFD_score = XPG_score + 0.05 Prime_Energy), which accounts for both protein-ligand interaction energy and total energy of the system, was calculated and used to rank the IFD poses. More negative IFD_score values indicated more favourable binding. Results are shown in Table 2.
Chemical synthesis of inhibitors Inhibitors 1, 54 3, 54 5 54 and 6 55 have been prepared as previously reported. Inhibitor 2 is commercial. Inhibitors 4 54 and 7 54 have been synthesized as described in detail in the next paragraphs. All solvent and reagents were used as received, unless otherwise stated. Melting points were determined on a hot-stage apparatus. 1 H-NMR and 13 C-NMR spectra were recorded at indicated frequencies, residual solvent peak was used as reference. Chromatography was performed by using silica gel (0.040-0.063 mm) and mixtures of ethyl acetate and petroleum ether (fraction boiling in the range of 40-60 • C) in various ratios (v/v). Compounds 8 54 and 9, 56 used in the synthesis of inhibitors 4 and 7, have been prepared as previously reported.