Protein-ligand binding-site similarity network
The networks constructed using the different similarity metrics are depicted in Fig. 2. Nodes represent ligand binding sites of proteins and edges connect similar binding sites. The binding sites of SARS-CoV-2 main protease are shown in red color, while other protein binding sites similar to the main protease are shown in blue. In the layout employed, hubs connecting various similar nodes and clusters are concentrated at the center, while disconnected communities of nodes migrate to the periphery. Properties of the three similarity networks were compared with the Erdӧs-Renyi (ER) random network (Table 1) at similar edge density [42, 43]. The protein binding-site similarity networks have high clustering coefficients and lower average path length than the random network, suggesting that these networks show small world behavior. This means all three networks have small diameters and two nodes are separated by ≤ 6 steps irrespective of their position in the graphs. The algebraic connectivity or Fiedler eigenvalue of a graph is the second-smallest eigenvalue of its Laplacian matrix (defined as L = D – A, where D is the diagonal matrix of vertex degrees, and A is the adjacency matrix); its magnitude is a measure of the overall connectivity of the graph [44, 45]. Low average path length as well as high value of the second-smallest eigenvalue of the Laplacian matrix in the Manhattan network suggest highly similar protein-ligand binding sites. The Euclidean network shows positive coefficient of degree assortativity, suggesting that high degree nodes preferentially connect to high degree nodes. Negative degree assortativity coefficient of the other networks suggest high degree nodes preferentially connect to low degree nodes in the Chi-Square and Manhattan networks. The group of similar protein-ligand binding sites is divided into different communities, as indicated by the modularity. The high modularity value in the Chi-Square and Manhattan networks suggest these networks cluster into different communities. A total of 153 unique protein/enzymes showing binding-site similarity to the SARS-CoV-2 main protease (Fig. 3) are listed in supplementary material (Supplementary file.xlsx). These proteins belong to different classes of proteins and found in different species. These group of proteins were then used to identify the ligands which bind to them.
Table 1
Different properties calculated from the three networks at edge density < 0.1%. ER = Erdős–Rényi model. Average clustering coefficient very much greater than that of the ER network and path length less than that of the ER network at similar edge density indicates the small world property.
Parameters
|
Euclidean Network
|
Chi-Square Network
|
Manhattan Network
|
Distance cutoff
|
2200
|
10200
|
64000
|
Number of nodes
|
7013
|
7013
|
7013
|
Number of edges
|
10562
|
10455
|
9203
|
Average degree
|
3.03
|
3.00
|
2.64
|
Average clustering coefficient
|
0.0096
|
0.022
|
0.015
|
Degree assortativity coefficient
|
0.015
|
-0.141
|
-0.30
|
Transitivity
|
0.042
|
0.013
|
0.0041
|
Edge density
|
0.00043
|
0.00043
|
0.00037
|
Average path length
|
5.23
|
4.01
|
3.75
|
Modularity
|
0.47
|
0.64
|
0.52
|
2nd smallest Eigenvalue of Laplacian
matrix
|
0.014
|
0.044
|
0.052
|
Average clustering coefficient (ER)
|
0.00029
|
0.00027
|
0.00034
|
Average path length (ER)
|
8.01
|
8.00
|
9.04
|
Transitivity (ER)
|
0.00047
|
0.00038
|
0.00025
|
Small world property
|
Yes
|
Yes
|
Yes
|
Molecular Docking
Based on interaction and binding affinity given as the Glide XP score, the top 9 molecules that are in clinical use and 10 molecules that are in investigational and experimental phase, were selected, and are shown in Fig. 4, and listed in Table 2. Hit consists of structurally and functionally diverse compounds such as Doxycycline and Minocycline tetracycline-based antibiotics, Neomycin and Tobramycin aminoglycoside-based antibiotics, and polypeptides such as PRD_000771, PRD_000772, Q27458218 and Miraziridine A. While Rutin, Isoquercitrin, Troxerutin and Hyperoside belong to the class of flavonoid-3-o-glycosides. Indinavir which is an HIV protease inhibitor also shows very good affinity towards SARS-CoV-2 main protease. SB-219383 which is a bicyclic compound inhibitor of Tyrosyl-TRNA Synthetase also shows good affinity. The drug molecules Neomycin, Tobramycin, Rutin, Indinavir, Doxycycline, Minocycline, Vorinostat, Alendronate, and Enalapril can be investigated for repurposing as SARS-CoV-2 main protease inhibitors. All the molecules show hydrogen bond interaction with residues such as PHE 140, ASN 142, GLU 166, CYS 145, GLN 189, etc. as depicted in Figure S1 and S2 of the supplementary material, while the residue HIE 41 shows π-π interaction with molecules having an aromatic ring. However, not all the sub pockets of the substrate binding site of SARS-CoV-2 main protease are occupied by compounds, so it is necessary to study their behavior using molecular dynamics simulation technique for their interaction stability and sub pocket occupancy.
Table 2
Molecules which show good interaction and binding affinity towards SARS-CoV-2 main protease with their Glide XP scores.
Sr. No
|
Compound
|
Glide XP score (kcal/mol)
|
Indication
|
Status
|
1
|
Rutin
|
-12.75
|
Nutraceutical [48]
|
Clinical Use
|
2
|
Indinavir
|
-10.51
|
HIV Protease Inhibitor [56]
|
Clinical Use
|
3
|
Neomycin
|
-9.95
|
Antibiotics [57]
|
Clinical Use
|
4
|
Tobramycin
|
-9.90
|
Antibiotics [57]
|
Clinical Use
|
5
|
Doxycycline
|
-9.45
|
Antibiotics [58]
|
Clinical Use
|
6
|
Minocycline
|
-9.21
|
Antibiotics [58]
|
Clinical Use
|
7
|
Vorinostat
|
-9.19
|
ADAM-10 Inhibitor [59]
|
Clinical Use
|
8
|
Alendronate
|
-8.88
|
Matrix Metalloproteinase Inhibitor [60]
|
Clinical Use
|
9
|
Enalapril
|
-8.80
|
Liver Carboxylesterase Inhibitor [46]
|
Clinical Use
|
10
|
Isoquercitrin
|
-12.76
|
3C-Like Proteinase Inhibitor [47]
|
Investigational
|
11
|
Troxerutin
|
-12.30
|
Nutraceutical [48]
|
Investigational
|
12
|
Hyperoside
|
-12.73
|
3C-Like Proteinase Inhibitor [49]
|
Experimental
|
13
|
PRD_000772
|
-12.33
|
3C-Like Proteinase Inhibitor [50]
|
Experimental
|
14
|
Monoxerutin
|
-12.13
|
Nutraceutical [51]
|
Experimental
|
15
|
Q27458218
|
-11.93
|
Prostasin Inhibitor [52]
|
Experimental
|
16
|
PRD_000771
|
-11.48
|
3C-Like Proteinase Inhibitor [50]
|
Experimental
|
17
|
ARC-1034
|
-11.10
|
Protein Kinase A Inhibitor [53]
|
Experimental
|
18
|
Miraziridine A
|
-11.06
|
Cathepsin Inhibitor [54]
|
Experimental
|
19
|
SB-219383
|
-11.01
|
Tyrosyl-TRNA Synthetase Inhibitor [55]
|
Experimental
|
15 ns Molecular Dynamics
MD simulation for a protein-ligand complex is a computationally expensive and time-consuming process. To eliminate compounds which show unstable binding or low binding affinity towards SARS-CoV-2 main protease, initially 15 ns MD simulation was performed for each ligand-protein complex. Snapshots of trajectories at various stages of MD simulation showing movement of the ligand in the SARS-CoV-2 main protease binding site are shown in Figures S3 and S4 of the supplementary material. The root mean square deviation (RMSD) of the C-alpha atom of the protein and the complexed ligand is depicted in Figure S5 of the supplementary material. During 15 ns of simulation it is observed that compounds Neomycin, Tobramycin, Vorinostat, Troxerutin and SB-219383 did not form stable complexes. These break away from the SARS-CoV-2 main protease after 5 to 10 ns of simulation time, except for SB-219383 which is displaced from the binding site after 10 ns. The RMSD of the C-alpha atom of the SARS-CoV-2 main protease shows stable trajectories with all complexed ligands except Indinavir and Isoquercitrin. Compounds Doxycycline, Minocycline, Alendronate, Isoquercitrin and Hyperoside show very low and stable RMSD compared to the other compounds. The RMSD trajectory of compound Q27458218 shows initial increase, then remains stable, while the RMSD trajectory of Monoxerutin after stabilization decreases at 9 ns, then again increases after 13 ns. Visual analysis of MD trajectories suggests that all compounds try to orient and adjust in the best possible pose into the binding site of the SARS-CoV-2 main protease. The root mean square fluctuation of the SARS-CoV-2 main protease is depicted in Figure S6 of the supplementary material. Atoms from loop residues regions SER 1 – PRO 9, LEU 50 – ASN 53, ASP 153 – ASP 155, PRO 168 – THR 199, GLY 215 – THR 225 and GLY 302 – VAL 3063 show high root mean square fluctuation (RMSF). The number of hydrogen bonds formed by the ligand with the binding site residue in each frame of the trajectory is shown in Figure S7 of supplementary material. Alendronate forms a large number of hydrogen bonds during the simulation. The number of hydrogen bonds formed by Indinavir and Enalapril with the SARS-CoV-2 main protease decrease as the simulation progresses; however, hydrogen bond formation by ARC-1034 and Miraziridine A are consistent throughout the simulations. The percentage hydrogen bond interaction by protein residue with ligand is listed in Table S1 of supplementary material. Amino acid residues THR 24, THR 25, THR 26, HIS 41, SER 46, ASN 142, GLY 143, SER 144, HIS 164, GLU 166, ASP 187 and GLN 189 are mainly involved in hydrogen bond formation with different ligands during MD simulation. The average binding free energies (ΔGBind) of Neomycin, Tobramycin, Vorinostat, Troxerutin and SB-219383 to the SARS-CoV-2 main protease during 15 ns MD simulation, along with their various components, such as van der Waals (vdW), Electrostatic, Polar solvation and solvent accessible surface area (SASA) energy, are listed in Table S2 of supplementary material. Compounds Q27458218 and Indinavir show high average binding affinity (ΔGBind) of -194.09 and − 177.13 KJ/mol, respectively, due to high contributions of van der Waals and electrostatic energies, followed by ARC-1034. Alendronate shows the lowest average binding free energy because of high contribution of polar solvation energy. The per residue energy contribution during 15 ns of MD simulation is shown in Figures S8 and S9 of supplementary material. The residues GLU 47, LEU 41, MET 49, CYC 145, MET 165, GLU 166, LEU 167, and PRO 168 have positive contributions, while residues ARG 45, PRO 39, SER 147, GLU 166, ASP 187, and HIS 164 negative contributions to the binding free energy of the protein-ligand complex.
100 ns Molecular Dynamics
MD simulations of the top 8 compounds Q27458218, Indinavir, ARC-1034, Monoxerutin, Minocycline, Miraziridine A, Doxycycline and Enalapril, which showed high average binding free energy (< -50 KJ/mol) during 15 ns MD runs, were extended to 100 ns. During 100 ns of MD simulations, it is observed that binding of compounds Q27458218, Doxycycline and Minocycline to the SARS-CoV-2 main protease is compact and stable throughout the simulation, as shown in Fig. 5. However, only Q27458218 occupies all the sub pockets of the substrate binding site throughout the simulation. The RMSD of the C-alpha atoms of the SARS-CoV-2 main protease remain stable during the simulation with all compounds except with Indinavir and Enalapril (Fig. 6A). Figure 6B shows the stable RMSD trajectories of these compounds. Monoxerutin is also able to bind in the binding site of the SARS-CoV-2 main protease but keeps readjusting. Similarly, Miraziridine A also tries to adjust in the binding site until 75 ns; thereafter it forms a stable complex with the protein. ARC-1034 gets displaced from the binding site after 20 ns to the edge of the site and is partially bound to it until 97 ns. Thereafter, ARC-1034 leaves the binding site completely and binds near it. Indinavir forms a stable interaction with the SARS-CoV-2 main protease until 50 ns of simulation; however, it leaves the binding site thereafter and again comes back to bind to the original site at 90 ns. Enalapril also leaves the binding site after 50 ns of simulation and binds to a different part of the protein as the simulation progresses.
The RMSF of protein backbone atoms during the 100 ns runs are depicted in Fig. 7. The loop region of the protein mentioned in the previous section shows high fluctuation. In the case of Indinavir, the fluctuation of the protein residues is highest, followed by Enalapril. The ligand binding site of the SARS-CoV-2 main protease is surrounded by loop regions which give some flexibility to it. Compounds which show stable binding cause less fluctuation in loop regions of the protein. Stability of compounds binding to the protein is also reflected in the number of hydrogen bonds formed by them during the simulation, as shown in Fig. 8. Compound Q27458218 consistently forms a large number of hydrogen bonds throughout the simulation. The number of hydrogen bonds formed by Miraziridine A increases after 75 ns of simulation time. For Indinavir, Enalapril, ARC-1034 and Monoxerutin, the number of hydrogen bonds formed decreases as the simulation progresses. Doxycycline and Minocycline, unlike other compounds, rarely form multiple hydrogen bonds during simulation. Residues which are involved in hydrogen bond formation with ligands during simulation are listed in Table 3. Hydrogen bonding between ligand and protein helps the ligand to bind in a specific orientation to initiate biochemical reaction. Amino acid residue GLU 166 seems to be important for ligand binding and forms a high percentage of the hydrogen bonds with every compound except Indinavir and Enalapril. Compound Monoxerutin forms hydrogen bonds with TYR 54, SER 144, ASP 187, ARG 188, THR 190 and GLN 192. Compound Q27458218 forms hydrogen bonds with multiple amino acid residues such as PHE 140, GLY 143 and HIS 164, signifying their important role in its stable binding to the SARS-CoV-2 main protease. Compounds ARC-1034 and Miraziridine A predominantly form hydrogen bonds with THR 26, HIS 41, SER 46 and GLN 189. Both Doxycycline and Minocycline predominantly form hydrogen bonds with HIS 41 and GLU 166, respectively.
The average binding free energies of all the ligands against the SARS-CoV-2 main protease are listed in Table 4. Compounds Q27458218, Doxycycline and Minocycline show increase in average ΔGBind to -202.61, -68.29 and − 70.21 KJ/mol, respectively as the simulation progresses after 15 ns of simulation time. Low standard deviation from average ΔGBind also signifies the strong and stable interaction between SARS-CoV-2 main protease and these three compounds. For the other compounds, not only is there a decrease in average ΔGBind but the standard deviation is also significant, suggesting that the interaction between these compounds and SARS main protease is not strong and stable. Both vdW and electrostatic interactions make important contributions to the free energy of binding of the ligand with the protein. The per residue energy contribution to the binding free energy is depicted in Figure S10. Residues contributing positively as well negatively to the binding free energy are as described in the previous section. However, the extent of contribution changes during 100 ns of simulation.
Replica Exchange Molecular Dynamics
To further ascertain the performance of MD simulations studies, REMD was performed on each system with four replicas. The overall replica exchange probability of each system during the simulation found to be 0.30. Q27458218 shows stable binding with SARS-CoV-2 main protease throughout the simulation in different replicas. The RMSD of C-alpha atoms of protein and ligand are depicted in Figure S11 and S12, respectively. The average RMSD of C-alpha protein in all replicas of the three systems was observed to be between 2 to 3 Angstrom.
Table 3
Percentage hydrogen bond interaction (> 1%) shown by ligands during MD simulation with amino acid residues of SARS COV-2 main protease.
Compound
|
Percentage H-bond Occupancy
|
THR 26
|
HIS 41
|
GLU 47
|
SER 46
|
TYR 54
|
PHE 140
|
ASN 142
|
GLY 143
|
SER 144
|
CYS 145
|
HIS 163
|
HIS 164
|
GLU 166
|
ASP 187
|
ARG 188
|
GLN 189
|
THR 190
|
GLN 192
|
Monoxerutin
|
-
|
-
|
-
|
-
|
5.36
|
-
|
1.34
|
-
|
8.80
|
-
|
4.48
|
-
|
25.88
|
4.53
|
15.52
|
2.26
|
8.74
|
5.42
|
Q27458218
|
-
|
-
|
2.86
|
-
|
-
|
25.57
|
-
|
10.90
|
-
|
1.56
|
-
|
55.25
|
58.76
|
-
|
-
|
-
|
-
|
-
|
ARC-1034
|
43.21
|
2.18
|
26.38
|
5.79
|
-
|
-
|
1.64
|
-
|
-
|
-
|
-
|
-
|
21.02
|
-
|
-
|
2.08
|
-
|
-
|
Miraziridine A
|
8.53
|
2.10
|
-
|
1.87
|
-
|
-
|
-
|
4.32
|
12.44
|
-
|
-
|
-
|
23.38
|
-
|
-
|
11.41
|
-
|
-
|
Indinavir
|
3.47
|
-
|
-
|
-
|
5.80
|
-
|
-
|
-
|
-
|
-
|
-
|
1.40
|
-
|
-
|
-
|
-
|
-
|
-
|
Doxycycline
|
|
16.66
|
|
1.61
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
1.76
|
-
|
-
|
-
|
4.48
|
-
|
Minocycline
|
-
|
-
|
-
|
1.41
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
24.99
|
-
|
-
|
-
|
-
|
-
|
Enalapril
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
-
|
1.58
|
-
|
-
|
Table 4
Average binding free energy of ligand (In KJ/mol) obtained from 100 ns MD simulation.
Compound
|
Avg. van der Waals Energy (ΔGvdW)
|
Avg. Electrostatic energy (ΔGElect)
|
Avg. Polar solvation energy (ΔGPolar)
|
Avg. SASA energy (ΔGSASA)
|
Avg. Binding free energy (ΔGBind)
|
Monoxerutin
|
-143.57 +/- 19.91
|
-90.62 +/- 30.82
|
206.75 +/- 44.81
|
-16.97 +/- 1.95
|
-44.42 +/- 20.37
|
Q27458218
|
-227.97 +/- 22.96
|
-389.02 +/- 34.99
|
442.21 +/- 40.20
|
-27.83 +/- 2.30
|
-202.61 +/- 23.84
|
ARC-1034
|
-98.47 +/- 24.93
|
-272.95 +/- 57.74
|
302.22 +/- 83.84
|
-15.55 +/- 2.96
|
-84.75 +/- 43.80
|
Miraziridine A
|
-165.62 +/- 19.59
|
-49.91 +/-21.30
|
179.51 +/- 44.49
|
-20.36 +/- 2.18
|
-56.38 +/- 35.96
|
Indinavir
|
-70.56 +/- 58.24
|
-129.07 +/- 61.36
|
77.79 +/- 82.36
|
-9.53 +/- 7.76
|
-131.36 +/- 69.70
|
Doxycycline
|
-131.40 +/- 16.64
|
-22.92 +/- 10.24
|
101.25 +/- 17.89
|
-15.22 +/- 1.51
|
-68.29 +/- 15.42
|
Minocycline
|
-136.25 +/-18.93
|
-36.20 +/- 24.71
|
117.86 +/- 32.87
|
-15.62 +/- 1.59
|
-70.21 +/- 14.78
|
Enalapril
|
-85.80 +/- 38.14
|
-11.08 +/- 11.25
|
69.56 +/- 40.82
|
-11.39 +/- 4.51
|
-38.72 +/- 36.87
|
Table 5
Average binding free energy of Q27458218 (In KJ/mol) obtained from REMD simulation.
Replica
|
Avg. van der Waals Energy (ΔGvdW)
|
Avg. Electrostatic energy (ΔGElect)
|
Avg. Polar solvation energy (ΔGPolar)
|
Avg. SASA energy (ΔGSASA)
|
Avg. Binding free energy (ΔGBind)
|
1
|
-159.61 +/- 21.72
|
-306.99 +/- 45.56
|
340.77 +/- 74.15
|
-19.71 +/- 2.15
|
-145.55 +/- 58.31
|
2
|
-145.42 +/- 35.62
|
-266.29 +/- 69.69
|
272.50 +/- 107.63
|
-19.73 +/- 4.44
|
-158.94 +/- 40.49
|
3
|
-179.15 +/- 24.12
|
-197.34 +/- 30.47
|
194.37 +/- 45.63
|
-20.89 +/- 2.23
|
-203.02 +/- 27.60
|
4
|
-171.66 +/- 19.13
|
-346.86 +/- 37.58
|
361.63 +/- 49.53
|
-21.98 +/- 2.10
|
-178.87 +/- 26.16
|
Being a peptide, the RMSD of Q27458218 in all replicas is higher during the simulation and ranges between 3 to 5 Angstrom. Throughout the simulation, ligands kept interacting with SARS-CoV-2 main protease through hydrogen bonding as depicted in Figure S13. Q27458218 shows hydrogen bonding with PHE 140, GLY 143, HIS 164 and GLU 166 in all replicas.
The average binding free energy of ligands for all replicas of each system calculated is listed in Table 5. Q27458218 showed the highest binding affinity towards SARS-CoV-2 in all replicas and is estimated to be about − 171 KJ/mol. In all simulations, van der Waals and electrostatic interaction energy contributed positively while polar solvation energy contributed negatively towards binding free energy of ligands. This REMD study suggests Q27458218 binds to SARS-CoV-2 main protease effectively in different conditions.
Out of the dataset of 1305 compound Q27458218 performed well in molecular simulation studies (Fig. 9) and based on this evidence, should be considered for chemoprophylaxis against SARS-CoV-2 by inhibiting its main protease enzyme subsequent to replication in the host. Compound Q27458218 is an experimental peptidomimetic designed as a human prostasin inhibitor (PDB ID: 3E16) [52]. It has shown the highest affinity towards the main protease of SARS-CoV-2 and can be considered for pre-clinical and clinical trials considering no established therapy or inhibitor is available for the main protease.