3.1 Virtual Screening
Structure-based drug design heavily relies on virtual screening techniques to identify potential lead compounds for target proteins from a vast array of natural compounds [58]. To narrow down the pool of potentially druggable molecules, we used the PyRx virtual screening platform. It enables the identification of possible drug candidates for specific target proteins linked to certain diseases. We used the AutoDock Vina interface with the default parameters. The IMPPAT 2.0 database was searched for a set of natural phytochemicals (n = 49) from Moringa oleifera, which were then screened for EGFR and VEGFR-2. By using the crystal structure of human EGFR (PDB entry code: 5ZWJ) and VEGFR-2 (PDB entry code: 4ASD), a virtual screening-based docking study was conducted. We have determined the likely binding affinities and interaction sites of the phytochemical compounds with the active site of two specific lung cancer receptors. All phytochemicals exhibited binding energy scores ranging from − 9.6 to -4.8 kcal/mol for the EGFR receptor and from − 9.5 to -4.3 kcal/mol for the VEGFR-2 receptor. To analyze drug likeness, ADMET, and toxicological properties, we selected the 21 compounds that bound to EGFR and VEGFR-2 proteins with an affinity greater than − 7 kcal/mol. Figure 4 depicted the binding affinity score of all the compounds that have a score higher than − 7 (kcal/mol) with both proteins.
3.2 Analysis of Drug-Like Properties
In the early phases of drug discovery, it is indispensable that researchers evaluate the drug-likeness properties through inquiry into the physicochemical features of compounds to accelerate drug development [59]. It is a prerequisite to comprehend the complex relationship between various molecular and structural characteristics like molecular weight, lipophilicity, rotatable bonds, surface area, molar refractivity, hydrogen bond acceptors and donors, and bioavailability to identify a compound suitable for potential drugs. Following a virtual screening process, the 21 compounds from M. oleifera were evaluated for their drug potential by inputting the Canonical SMILES into the SwissADME online server. In Table 1 the evaluation results of the drug-like properties obtained using the SwissADME are depicted. The compounds were thoroughly scrutinized to figure out their compliance with Lipinski's rule of thumb, commonly referred to as the "rule of five," for evaluating oral drug efficacy. When a medication fails to meet multiple guidelines, it suggests inadequate oral bioavailability [60]. Subsequently, seven out of 21 candidate drugs meet all Lipinski rules, which include criteria like MW ≤ 500, HBA ≤ 10, HBD ≤ 5, Log P (octanol/water) ≤ 5, and 40 ≤ MR ≤ 130. The remaining 14 compounds adhere to the Lipinski rule with only 1 violation, which is also acceptable. Furthermore, these compounds were reviewed based on the Veber and Egan rules. Out of all the compounds, only six had one acceptable violation of the Veber rule, while the rest had no violations. When it comes to the Egan rule, seven compounds satisfied all the criteria, while 14 compounds fell short of meeting one rule. All the compounds show excellent bioavailability scores, with each one reaching around 55% bioavailability.
Table 1
Assessment of the physiochemical properties, lipophilicity and drug-like characteristics of the selected compounds according to SwissADME.
Compound CID
|
Phytochemical Name
|
Properties
|
Physiochemical Properties
|
Lipophilicity
|
Drug-likeness
|
Rules
|
< 500
|
< 5
|
< 10
|
≥40 - ≤130
|
≤ 140
|
< 10
|
≤ 5
|
≤ 1
|
≤ 1
|
≤ 1
|
-
|
MW (Da)
|
HBD
|
HBA
|
MR
|
TPSA (Ų)
|
NRB
|
iLOGP
|
Lipinski’s violation
|
Veber Violations
|
Egan Violation
|
BS
|
72281
|
Hesperetin
|
302.28
|
3
|
6
|
78.06
|
96.22
|
2
|
2.24
|
0
|
0
|
0
|
0.55
|
5280647
|
Gossypetin
|
318.24
|
6
|
8
|
80.06
|
151.59
|
1
|
1.33
|
1
|
1
|
1
|
0.55
|
12795736
|
delta7-Avenasterol
|
412.69
|
1
|
1
|
132.75
|
20.23
|
5
|
5.07
|
1
|
0
|
1
|
0.55
|
5281680
|
Quercetagetin
|
318.24
|
6
|
8
|
80.06
|
151.59
|
1
|
1.73
|
1
|
1
|
1
|
0.55
|
92113
|
24-Methylenecholesterol
|
398.66
|
1
|
1
|
127.95
|
20.23
|
5
|
4.78
|
1
|
0
|
1
|
0.55
|
5283638
|
Clerosterol
|
412.69
|
1
|
1
|
132.75
|
20.23
|
6
|
5.10
|
1
|
0
|
1
|
0.55
|
145858
|
Flavylium
|
207.25
|
0
|
1
|
66.06
|
13.14
|
1
|
-0.76
|
0
|
0
|
0
|
0.55
|
72277
|
Epigallocatechin
|
306.27
|
6
|
7
|
76.36
|
130.61
|
1
|
0.98
|
1
|
0
|
0
|
0.55
|
111220
|
Bauerenol
|
426.72
|
1
|
1
|
135.14
|
20.23
|
0
|
4.71
|
1
|
0
|
1
|
0.55
|
222284
|
Beta-Sitosterol
|
414.71
|
1
|
1
|
133.23
|
20.23
|
6
|
4.79
|
1
|
0
|
1
|
0.55
|
72201063
|
Pterygospermin
|
406.52
|
0
|
2
|
123.32
|
89.12
|
4
|
3.45
|
0
|
0
|
0
|
0.55
|
10026486
|
Asperglaucide
|
444.52
|
2
|
4
|
126.35
|
84.50
|
13
|
3.57
|
0
|
1
|
0
|
0.55
|
60123241
|
beta-Sitostenone
|
412.69
|
0
|
1
|
132.27
|
17.07
|
6
|
4.69
|
1
|
0
|
1
|
0.55
|
586537
|
6-Chromanol, 2,8-dimethyl-2-(4,8,12-trimethyltridecyl)
|
402.65
|
1
|
2
|
129.34
|
29.46
|
12
|
5.33
|
1
|
1
|
1
|
0.55
|
65084
|
Gallocatechin
|
306.27
|
6
|
7
|
76.36
|
130.61
|
1
|
0.98
|
1
|
0
|
0
|
0.55
|
92729
|
gamma-Tocopherol
|
416.68
|
1
|
2
|
134.31
|
29.46
|
12
|
5.76
|
1
|
1
|
1
|
0.55
|
493570
|
Riboflavin
|
376.36
|
5
|
8
|
96.99
|
161.56
|
5
|
0.97
|
0
|
1
|
1
|
0.55
|
131751186
|
Moringyne
|
312.32
|
4
|
7
|
75.31
|
116.45
|
4
|
0.77
|
0
|
0
|
0
|
0.55
|
44559760
|
niaziminin A
|
411.47
|
2
|
8
|
105.92
|
138.90
|
9
|
2.88
|
0
|
0
|
1
|
0.55
|
91746804
|
28-Isoavenasterol acetate
|
468.75
|
0
|
2
|
147.04
|
26.30
|
7
|
5.73
|
1
|
0
|
1
|
0.55
|
53477901
|
(3S,9S,10R,13R,14S,17R)-10,13-dimethyl-17-[(2S)-6-methyl-5-methylideneheptan-2-yl]-2,3,4,7,8,9,11,12,14,15,16,17-dodecahydro-1H-cyclopenta[a]phenanthren-3-ol
|
398.66
|
1
|
1
|
127.95
|
20.23
|
5
|
4.78
|
1
|
0
|
1
|
0.55
|
3.3 ADME Properties Analysis
Drug research relies heavily on pharmacokinetic properties (PKs), which help understand and predict biological consequences, like the favorable or negative influence of a chemical on a particular system. The pharmacokinetic properties evaluate whether the drug compound has been efficiently absorbed from the gastrointestinal system, accurately transported to its target site, metabolized adequately, and removed from the body without causing any negative consequences. Properties related to absorption, distribution, metabolism, and excretion (ADME) play a crucial role in determining the pharmacokinetics of a molecule, which are pivotal steps for the success of a drug. Therefore, computational methods can be used to anticipate ADME characteristics to prevent drug failures in clinical trials, and compounds with unfavorable pk properties can be excluded from consideration. This area is currently a popular focus in pharmaceutical research and development (R&D). We evaluated potential hit compounds for their pharmacological applicability by examining a wide range of ADME properties using the pkCSM server. The results are shown in Table 2. Exploring the ADME properties of the mentioned drugs, including water solubility, intestinal absorption in humans (HIA), VDss in humans, blood-brain barrier permeability, CYP2D6/CYP3A4 substrates, and CYP1A2/CYP2C19/CYP2C9/CYP2D6/ CYP3A4 inhibitors, as well as drug total clearance. Inspection of these characteristics is essential to guaranteeing a drug's efficacy by assessing its biological activity and precise targeting of the intended organ at a specific concentration. Based on the research, each compound has a unique water solubility range. The identified compounds exhibit a solubility range of -1.962 to -7.812, suggesting that most of them are moderately to highly soluble in physiological environments. Substances with values from − 10 to -6 exhibit poor solubility; those greater than − 6 and less than − 4 indicate moderate solubility; compounds between − 4 and − 2 are considered soluble; and compounds with values between − 2 and 0 are highly soluble [61]. Gallocatechin (CID: 65084), Epigallocatechin (CID: 72277), Hesperetin (CID: 72281), Gossypetin (CID: 5280647), and Quercetagetin (CID: 5281680) have been identified as soluble compounds with a range of -2.9 to -3.047. Low absorption is indicated when the percentage of absorption by the human intestines is less than 30%. All the compounds show a range of human intestinal absorption (HIA) from 54.128–98.556%, indicating high absorption rates for all compounds except Riboflavin (CID:493570), which has a low absorption rate of 36.107%, deeming it undruggable. The compound’s distribution was evaluated by determining their volume of distribution in humans (VDss) and their ability to cross the blood-brain barrier (BBB). A log VDss value less than − 0.15 suggests a lower distribution, while a value greater than 0.45 indicates a higher distribution of a drug. With a low volume of dispersion, Asperglaucide, Beta-Sitostenone, Riboflavin, Moringyne, and Niaziminin A are classified as undruggable, accounting for 24% (5/21) of the compounds. The remaining substances showed a moderate to high volume of distribution. The blood-brain barrier protects the central nervous system (CNS) by preventing specific substances, mainly harmful ones, from accessing the brain tissues. The BBB regulates the entry of external compounds to maintain the stability of the CNS [62]. Based on the pkCSM guidelines, 38% (8/21) of the compounds are deemed potential drug candidates due to their low likelihood of penetrating the BBB, while the rest of the compounds are capable of crossing the BBB. Detoxifying foreign compounds and metabolizing drugs involves heme-containing enzymes called cytochrome P450s (CYPs). Drugs can either act as inhibitors or inducers of cytochrome P450 enzymes, causing potential drug interactions that could lead to negative effects or reduced effectiveness of treatments [63]. Evaluation of compounds for their effects on cytochrome P450 (CYP) is critical since CYP inhibitors may impact drug metabolism. The CYP types (1A2, 2C9, 2C19, 2D6, and 3A4) are essential in the biotransformation of more than 90% of medications in the initial metabolism phase. Two different forms, 2D6 and 3A4, have profound effects on drug metabolism [37]. In metabolism, only CID:72201063 was identified as a CYP2D6 substrate, while none of the compounds acted as inhibitors. Among 21 compounds tested, 62% were identified as CYP3A4 substrates, with compounds CID:72201063 and CID:10026486 exhibiting inhibitory properties. Moreover, CID:5280647, CID:5281680, and CID:145858 were found to be strong inhibitors of CYP1A2, while CID:72201063, CID:10026486, and CID:145858 were identified as inhibitors of CYP2C19. CID:10026486 is the sole compound that inhibits CYP2C9. The bioavailability and half-life of xenobiotic compounds are affected by total clearance (TC). The frequency and dosage of a particular drug are greatly affected by this factor. Hence, the prediction establishes a basis for the initial dose in in vivo studies and aids in evaluating the prospect of therapeutic dosing [64]. A lower clearance index value suggests a longer persistence of drugs in the body. In this study, we scrutinized the excretion properties to understand the stability of the compounds as drugs in the body prior to their elimination. The predictive values indicate that the total clearance index of certain compounds falls within a range from 0.044 to 0.361, suggesting very low excretion for these specific compounds, which make up 38% (8/21) of the total. In contrast, the remaining compounds are expected to have a moderate to high excretion rate.
We have identified five compounds with favorable ADMET characteristics from a pool of 21 compounds for additional toxicity evaluation. After evaluating the in silico ADMET properties of compounds, it was discovered that five particular compounds: Gallocatechin, Epigallocatechin, Hesperetin, Gossypetin and Quercetagetin met all the assessed pharmacokinetic criteria. Each of the five compounds examined showed a favorable absorption rate for intestinal absorption in humans and was found to be water-soluble. All the selected compounds met the BBB criteria. Compounds are likely distributed throughout the body based on the VDss parameter. Because none of the five compounds inhibit any of the several types of CYP isoenzymes, they are all readily excreted from the body after a therapeutic response. The low score in the total clearance analysis suggests that these lead compounds are expected to have minimal drug excretion, leading to a prolonged presence in the body. The comprehensive pharmacokinetic and ADME analyses indicated that all five selected compounds can be converted into drugs and are appropriate for toxicological evaluation and further testing.
Table 2
The results of in-silico ADME profiling of the potential hit molecules.
Compound CID
|
Phytochemical Name
|
Absorption
|
Distribution
|
Metabolism
|
Excretion
|
Water solubility
|
Intestinal absorption (human)
|
VDss (human)
|
BBB permeability
|
Substrate
|
Inhibitor
|
Total Clearance
|
CYP
|
2D6
|
3A4
|
1A2
|
2C19
|
2C9
|
2D6
|
3A4
|
|
Numeric (Log mol/L)
|
Numeric (% Absorbed)
|
(Log L/kg)
|
Numeric
(Log BB)
|
Categorical (Yes/No)
|
Numeric (Log ml/min/kg)
|
72281
|
Hesperetin
|
-3.047
|
70.277
|
0.746
|
-0.719
|
No
|
No
|
No
|
No
|
No
|
No
|
No
|
0.044
|
5280647
|
Gossypetin
|
-2.9
|
68.009
|
1.552
|
-1.472
|
No
|
No
|
Yes
|
No
|
No
|
No
|
No
|
0.304
|
12795736
|
delta7-Avenasterol
|
-6.715
|
94.642
|
0.179
|
0.764
|
No
|
Yes
|
No
|
No
|
No
|
No
|
No
|
0.613
|
5281680
|
Quercetagetin
|
-2.904
|
62.773
|
1.424
|
-1.664
|
No
|
No
|
Yes
|
No
|
No
|
No
|
No
|
0.307
|
92113
|
24-Methylenecholesterol
|
-7.041
|
94.691
|
0.426
|
0.777
|
No
|
Yes
|
No
|
No
|
No
|
No
|
No
|
0.604
|
5283638
|
Clerosterol
|
-7.021
|
94.171
|
0.385
|
0.757
|
No
|
Yes
|
No
|
No
|
No
|
No
|
No
|
0.677
|
145858
|
Flavylium
|
-4.852
|
96.182
|
0.24
|
0.454
|
No
|
Yes
|
Yes
|
Yes
|
No
|
No
|
No
|
0.716
|
72277
|
Epigallocatechin
|
-2.969
|
54.128
|
1.301
|
-1.377
|
No
|
No
|
No
|
No
|
No
|
No
|
No
|
0.328
|
111220
|
Bauerenol
|
-6.36
|
94.567
|
0.231
|
0.668
|
No
|
Yes
|
No
|
No
|
No
|
No
|
No
|
0.116
|
222284
|
Beta-Sitosterol
|
-6.773
|
94.464
|
0.193
|
0.781
|
No
|
Yes
|
No
|
No
|
No
|
No
|
No
|
0.628
|
72201063
|
Pterygospermin
|
-4.779
|
90.428
|
0.555
|
0.581
|
Yes
|
Yes
|
No
|
Yes
|
No
|
No
|
Yes
|
-0.196
|
10026486
|
Asperglaucide
|
-4.098
|
94.997
|
-0.016
|
-0.283
|
No
|
Yes
|
No
|
Yes
|
Yes
|
No
|
Yes
|
0.619
|
60123241
|
beta-Sitostenone
|
-6.212
|
98.556
|
-0.141
|
0.847
|
No
|
Yes
|
No
|
No
|
No
|
No
|
No
|
0.56
|
586537
|
6-Chromanol, 2,8-dimethyl-2-(4,8,12-trimethyltridecyl)
|
-7.812
|
89.6
|
0.902
|
0.698
|
No
|
Yes
|
No
|
No
|
No
|
No
|
No
|
0.824
|
65084
|
Gallocatechin
|
-2.969
|
54.128
|
1.301
|
-1.377
|
No
|
No
|
No
|
No
|
No
|
No
|
No
|
0.328
|
92729
|
gamma-Tocopherol
|
-7.602
|
90.043
|
0.732
|
0.739
|
No
|
Yes
|
No
|
No
|
No
|
No
|
No
|
0.821
|
493570
|
Riboflavin
|
-2.029
|
36.107
|
-0.196
|
-1.552
|
No
|
No
|
No
|
No
|
No
|
No
|
No
|
0.7
|
131751186
|
Moringyne
|
-1.962
|
60.996
|
-0.139
|
-0.829
|
No
|
No
|
No
|
No
|
No
|
No
|
No
|
1.118
|
44559760
|
niaziminin A
|
-3.394
|
64.472
|
-0.4
|
-1.037
|
No
|
No
|
No
|
No
|
No
|
No
|
No
|
0.361
|
91746804
|
28-Isoavenasterol acetate
|
-6.996
|
96.606
|
0.071
|
0.72
|
No
|
Yes
|
No
|
No
|
No
|
No
|
No
|
0.48
|
53477901
|
(3S,9S,10R,13R,14S,17R)-10,13-dimethyl-17-[(2S)-6-methyl-5-methylideneheptan-2-yl]-2,3,4,7,8,9,11,12,14,15,16,17-dodecahydro-1H-cyclopenta[a]phenanthren-3-ol
|
-7.041
|
94.691
|
0.426
|
0.777
|
No
|
Yes
|
No
|
No
|
No
|
No
|
No
|
0.604
|
3.4 Toxicity Analysis
Drug toxicity is a significant concern and poses a serious health issue in the medical field, as harmful side effects are often cited as the main reasons for failures in the development of new drugs in later stages [35]. Toxicity is the measure of the potential harm that a chemical substance might do to an organism or its various components, including cells and organs. Hence, the timely identification of toxicity would be immensely advantageous [65]. Computational toxicity prediction is faster, cheaper, and offers data more frequently, reducing the need for costly biological experiments. We predicted the toxicological properties of the ADME-selected five natural compounds using the online platforms ProTox-3.0 and pkCSM to support their potential as drug candidates. Table 3 highlights the outcome regarding the potential toxicity of all molecules being examined. Based on the outcomes from the pkCSM server, it is anticipated that none of the compounds will exhibit hepatotoxicity, hERGI and II (human Ether-a-go-go-Related gene) inhibition, skin sensitization, or AMES toxicity. The highest dose that most patients can tolerate is referred to as the maximally tolerated dose (MTD). The compounds with an MTD value exceeding 0.477 logs (mg/kg/day) may be suitable for higher dosage administration. In this inquiry, the highest dose that humans could tolerate was 0.506, 0.473, 0.486, 0.25, and 0.506 Log mg/kg/day for epigallocatechin, gossypetin, quercetagetin, hesperetin, and gallocatechin, pointing to a moderate level of dosage adhering to the established protocols. The Oral Rat Acute Toxicity (LD50) values for epigallocatechin, gossypetin, quercetagetin, hesperetin, and gallocatechin were 2.492, 2.527, 2.537, 2.042, and 2.492 mol/kg, respectively, revealing an adequate safety profile. Based on the ProTox-3.0 server, the compounds (epigallocatechin, gossypetin, quercetagetin, hesperetin, and gallocatechin) are reported to be non-hepatotoxic, non-neurotoxic, non-cytotoxic, and non-immunotoxic. All the toxicity parameters evaluated were equally reassuring, confirming a high level of safety and indicating that these compounds are strong contenders for further research.
3.5 Evaluation of the Bioactivity
Inspection of the potential biological activity of compounds is of the utmost importance and can be accomplished by evaluating their bioactivity score. To properly evaluate the bioactivity score of prospective drugs, it is vital to scrutinize the impact on four key factors: the control of ion channels, the interaction between ligands and G protein-coupled receptors, the inhibition of protease, kinase, and other enzymes, and the impact on nuclear receptor ligands. The molecular inspiration cheminformatics program was employed to compute the pharmacological activity of the five selected compounds, and the outcomes are listed in Table 4. In cases where the compound's score exceeds 0.00, it is more likely to show significantly higher physiological activity. Predictions illustrate a moderate level of activity when the score is within the range of -0.50 to 0. From a biological standpoint, molecules with scores below − 0.50 are deemed inactive [66]. Based on the observations, it appears that multiple pathways can influence the physiological effects of the compounds. This could also be a result of interactions involving GPCR ligands, nuclear receptor ligands, and the inhibition of protease and other enzymes. The bioactivity evaluations of the compound revealed a strong correlation with all the drug targets in the current study, with all selected compounds demonstrating highly promising bioactivity scores.
Table 3
Evaluation of the toxicity profile of the five potent compounds using pkCSM and Protox-3.0 Online server.
Compound CID
|
Phytochemical Name
|
pkCSM
|
ProTox
|
AT
|
MTD
|
hERG I inhibitor
|
hERG II inhibitor
|
ORAT
|
SS
|
Hepatotoxicity
|
Neurotoxicity
|
Immunotoxicity
|
Cytotoxicity
|
65084
|
Gallocatechin
|
No
|
No
|
No
|
No
|
2.492
|
No
|
Inactive
|
Inactive
|
Inactive
|
Inactive
|
72277
|
Epigallocatechin
|
No
|
0.506
|
No
|
No
|
2.492
|
No
|
Inactive
|
Inactive
|
Inactive
|
Inactive
|
72281
|
Hesperetin
|
No
|
0.25
|
No
|
No
|
2.042
|
No
|
Inactive
|
Inactive
|
Inactive
|
Inactive
|
5280647
|
Gossypetin
|
No
|
0.473
|
No
|
No
|
2.527
|
No
|
Inactive
|
Inactive
|
Inactive
|
Inactive
|
5281680
|
Quercetagetin
|
No
|
0.486
|
No
|
No
|
2.537
|
No
|
Inactive
|
Inactive
|
Inactive
|
Inactive
|
AT: AMES Toxicity; MTD: Maximum Tolerated Dose; hERG: human Ether-a-go-go-Related gene; ORAT: Oral Rat Acute Toxicity: HT: Hepatotoxicity; SS: Skin Sensitization.
Table 4
Evaluation of the potent compound’s bioactivity using Molinspiration Cheminformatics Software.
Compound CID
|
Phytochemical Name
|
GPCR ligand
|
Ion channel modulator
|
Kinase inhibitor
|
Nuclear receptor ligand
|
Protease inhibitor
|
Enzyme inhibitor
|
65084
|
Gallocatechin
|
0.40
|
0.14
|
0.14
|
0.57
|
0.29
|
0.49
|
72277
|
Epigallocatechin
|
0.40
|
0.14
|
0.14
|
0.57
|
0.29
|
0.49
|
72281
|
Hesperetin
|
0.04
|
-0.26
|
-0.20
|
0.38
|
-0.13
|
0.16
|
5280647
|
Gossypetin
|
-0.09
|
-0.18
|
0.30
|
0.30
|
-0.23
|
0.30
|
5281680
|
Quercetagetin
|
-0.11
|
-0.28
|
0.26
|
0.22
|
-0.30
|
0.26
|
3.6 Molecular Docking Analysis
Molecular docking analysis facilitates the comprehension of interactions between ligands and proteins, enabling the prediction of a most likely binding pose for each compound in the protein's active region. Additionally, the outcomes of docking offers an effective means to create new potential inhibitors. In our study, we performed molecular docking for each of the compounds with both EGFR and VEGFR-2 receptors using AutoDock Vina and BINDSURF. Table 5 displays the 2D structures of the ligands with binding affinity scores from both of these tools. These tools give us the same results in terms of binding affinity and interaction residues. All of our selected compounds have a docking score − 7.6 to -8.6 with EGFR and VEGFR-2 receptors. Figure 5-Figure 9 displays the interactive residues following docking in AutoDock Vina, and the supplementary file (Figure S2 and S3) contains the interactive residues following BINDSURF docking. All the binding residues and their specific interaction types, along with the cross-checked results of CASTp-indicated active residues, are presented in Table 6 and Table 7. The outcomes of CASTp analysis are presented in supplementary file (Table S1 and S2).
The human EGFR protein contains a sequence of 1210 amino acids (AA) and has a molecular weight of around 134 kDa. The tyrosine kinase domain (TKD) of this protein are divides into two lobes, encompassing residues 669 AA – 979 AA. The N-lobe consists of five β-sheet strands (β1–5) and a αC-helix that stretches from position 729 AA to 744 AA. The C-lobe is larger than the N-lobe and is made up of five α helices (αI, αH, αG, αF, and αE) [67, 68]. The C-lobe plays an important role in EGFR activation through residues from Asp831-Val852, which has a unique Asp-Phe-Gly (DFG) motif at its core. In the active state, the DFG motif assumes a distinct shape that facilitates the binding of ATP and the transfer of a phosphate group to tyrosine residues [69]. Phosphorylation of the TKD is crucial for the signaling pathway of EGFR. This domain is responsible for the catalytic function of EGFR. The kinase domain also has a highly conserved ATP-binding region [70]. According to our structural analysis study of the EGFR tyrosine kinase domain, certain residues in the TK domain's α-helix, β-sheets, and hinge regions are located in the ATP binding site, specifically residues Leu_844, Met_793, Ala_743, Val_726, and Leu_718. These residues are located at β1–3, β-6, and the hinge region and have the most frequent contact with the ligands in all crystal structures. This region is the most important part for the design of inhibitors due to its direct role in the aberrant function of the protein associated with cancer [71].
Our molecular docking analysis showed, the compound gallocatechin (CID: 65084) forms conven- tional hydrogen bonds with EGFR residues LEU_788, MET_793, and THR_854, and the compound epigallocatechin (CID: 72277) forms conventional hydrogen bonds with residues LEU_788 and MET_790. Both compounds interact with residues LEU_718, VAL_726, ALA_743, and LYS_745 by the pi-alkyl/alkyl bond. The EGFR residue MET_790 makes a pi-alkyl and pi-sulfur bond, and residue LEU_844 makes a pi-sigma bond with gallocatechin. In contrast, epigallocatechin interacts with residue MET_790 by also forming a pi-sulfur bond, and residue LEU_844 by forming both pi-sigma and pi-alkyl bonds.
The compound hesperetin (CID: 72281) has the most advantageous interaction with EGFR as an inhibitor because it binds to both the mutant residues MET_790 and SER_797 inside the protein. This compound interacts with residues ALA_743, LYS_745, and SER_797 with conventional hydrogen bonds and forms pi-alkyl/alkyl bonds with residues LEU_718, VAL_726, ALA_743, and LYS_745. In addition, residue MET_790 forms a pi-sulfur and residue LEU_844 forms a pi-sigma bond with hesperetin. The compound gossypetin (CID: 5280647) interacts with residues LEU_788 and ASP_855 by forming conventional hydrogen bonds and with residue GLY_796 by a carbon hydrogen bond. The residues VAL_726, ALA_743, LYS_745, and LEU_844 interact by pi-alkyl/alkyl bonds with this compound. This compound also forms a pi-sulfur bond with residue MET_790 and a pi-sigma bond with residue LEU_844 of the receptor. Quercetagetin (CID: 5281680) interacts with LYS_745, THR_854, and PHE_856 by conventional hydrogen bonding and with LEU_747,MET_766, LEU_777, LEU_788, and LEU_858 by several pi-alkyl/alkyl bonds. In addition, this compound interacts with residue LEU_788 by Pi-Sigma and residue ASP_855 by an Anion bond.
Conversely, the whole VEGFR-2 consists of 1356 amino acids, which include a mature protein (20∼1356 AA) and a signal peptide (1∼19 AA) [72]. The catalytic tyrosine kinase domain (TKD) is the most conserved region within the VEGFR-2 protein. This domain includes an ATP binding domain (TKD1), a phosphotransferase domain (TKD2), a flexible C-terminal domain (CTD) and a kinase insert domain (KID). The N-terminus of the cytoplasmic tyrosine kinase domain contains a hydrophobic pocket that includes a glycine-rich ATP phosphate receptor loop (GXGXXG, 841AA–846 AA) within the β-sheet structures [73]. There are several α-helical structures in TKD's C-terminal. These include an activation loop (A-loop, 1045 AA–1075 AA) and a catalytic loop (HRDLAARN, 1026 AA–1033 AA), which are very important for VEGFR-2's catalytic properties [74]. In our study, the compound gallocatechin interacts with the VEGFR-2 protein through six conventional hydrogen bonds. These bonds include residues ASN_923, SER_925, THR_926, ARG_929, ALA_1050, and ASP_1056. It also forms three pi-alkyl bonds with ARG residues 1032 and 1051. The receptor maintains a strong interaction with epigallocatechin via eight conventional hydrogen bonds with residues ASP_923, SER_925, THR_926, ARG_929, ARG_1051, LYS_1055, and ASP_1056, as well as two pi-alkyl bonds with ARG_1051.
Hesperetin shows contiguity with residues ASN_923, SER_925, THR_926, and ARG_929 by a conventional hydrogen bond and with ASP_1058 by a carbon hydrogen bond. The residues ARG_842, ASN_923, and ARG_926 form four conventional hydrogen bonds with the compound gossypetin. The residue ARG_1051 forms a carbon hydrogen and a pi-alkyl bond, while ARG_1032 forms two pi-alkyl/alkyl bonds with this compound. There are four conventional hydrogen bonds and a carbon hydrogen bond between quercetagetin and residues ARG_842, THR_926, ARG_1032, ALA_1050, and PHE_1047 of the receptor. In addition, residue ARG_1032 forms two pi-alkyl bonds with the compound. In order to effectively inhibit EGFR T790M/C797S mutations, next-generation NSCLC drugs must compete with ATP while also being able to target the mutant EGFR. This is a challenging task due to the strong binding of ATP to EGFR T790M. All of our compounds, except quercetagetin, interact strongly with MET_790 through multiple hydrogen and hydrophobic bonds, indicating that they can effectively compete with ATP for this specific mutant residue. The compounds also interact with the DFG motif's residues. Specific inhibitors that bind to the residues of this motif can efficiently block the attachment of phosphate groups to tyrosine residues, thus inhibiting the uncontrollable cell proliferation caused by EGFR mutations.
On the other hand, except for epigallocatechin, all the selected compounds have a strong interaction with the ARG_1032 residue of VEGFR-2. It is crucial because the VEGFR-2 residue ARG_1032 is now a potential target for NSCLC cancer therapy. Numerous studies have shown that alteration in this specific residue of VEGFR-2 are associated with resistance against several cancer treatments and enhanced tumor malignancy. All of our compounds also bind to the active site residues of the catalytic kinase domain, specifically the ATP binding residues of both receptors. This makes them more effective as both EGFR and VEGFR-2 inhibitors.
Table 6
Residues involved in the binding of selected compounds and EGFR receptors with comparison of the active site residues found in the CASTp 3.0.
Protein
|
Ligand
|
H-bond interaction
|
Hydrophobic interaction
|
CASTp active residues
|
EGFR (5ZWJ)
|
Gallocatechin
|
LEU788, MET793, THR854
|
Pi-Sigma: LEU844
Pi-Alkyl: LEU718, VAL726, ALA743, LYS745
Pi-Sulfur: MET790
|
All found
|
Epigallocatechin
|
LEU788, MET790
|
Pi-Sigma: LEU844
Pi-Alkyl: LEU718, VAL726, LYS745, ALA743
|
All found
|
Hesperetin
|
ALA743, LYS745, SER797
|
Pi-Sigma: LEU844
Pi-Alkyl: LEU718, VAL726, LEU777, LEU778
|
All found
|
Gossypetin
|
LEU788, ASP855, GLY796
|
Pi-Sigma: LEU844
Pi-Alkyl: VAL726, ALA743, LYS745
Pi-Sulfur: MET790
|
All found
|
Quercetagetin
|
LYS745, THR854, PHE856
|
Pi-Sigma: LEU788
Pi-Alkyl: LEU747, MET766, LEU777, LEU858
Pi-Anion: ASP855
|
All found
|
Table 7
Residues involved in the binding of selected compounds and VEGFR-2 receptors with comparison of the active site residues found in the CASTp 3.0.
Protein
|
Ligand
|
H-bond interaction
|
Hydrophobic interaction
|
CASTp Active Residues
|
VEGFR-2 (4ASD)
|
Gallocatechin
|
ASN923, SER925, THR926, ARG929, ALA1050, ASP1056
|
Pi-Alkyl: ARG1032, ARG1051
|
All found
|
Epigallocatechin
|
ASN923, SER925, THR926, ARG929, ARG1051, LYS1055, ASP1056
|
N/A
|
All found
|
Hesperetin
|
ASN923, SER925, THR926, ARG929, ASP1058
|
Pi-Sigma: ARG1051
Pi-Alkyl: ARG1032
|
All found
|
Gossypetin
|
ASN923, ARG929, ARG842, ARG1051
|
Pi-Alkyl: ARG1032
|
All found
|
Quercetagetin
|
ARG842, THR926, ARG1032, ALA1050, PHE1047
|
N/A
|
All found
|
3.7 Molecular Dynamics Simulation
Molecular dynamics (MD) simulation is utilized to discern the dynamics of a protein-ligand interaction within a defined time frame. MD simulation provides an accurate depiction of the dynamic behavior of protein-ligand interactions and enables a thorough comprehension of their demeanor in a biologically significant environment. This is an important and well-established structure-based approach to understanding the atomic-level properties of protein-ligand complexes through deviation, fluctuation, and intermolecular interaction analysis [75, 76]. Furthermore, by confirming that changes in temperature or pressure have no effect on the structure's conformation, MD simulation will validate the docking complex's stability [77]. In order to verify the binding modalities of ligands and assess the stability of the complexes, we ran MD simulations using the Desmond software package of Schrödinger. Based on the binding affinity score and ADMET analysis, compounds Gallocatechin (CID: 65084), Epigallocatechin (CID: 72277), Hesperetin (CID: 72281), Gossypetin (CID: 5280647), and Quercetagetin (CID: 5281680) satisfied the criteria to be potential drugs for both EGFR and VEGFR-2, and therefore, we selected these compounds for MD simulation. We performed the simulation for a duration of 200 nanoseconds and included both apo-proteins (4ASD and 5ZWJ without ligands) and holo-proteins (4ASD and 5ZWJ with selected ligands). During the whole simulation period, it was apparent that the ligands remained attached to the protein within the binding pocket. Which implies that even in a physiological environment, the ligands will establish persistent interactions with the protein. In addition, some other important analyses, such as the root mean square deviation (RMSD) of atomic distances, root mean square fluctuation (RMSF), molecular surface area (MolSA), radius of gyration (Rg), protein-ligand contact map, and polar surface area (PSA), were analyzed following the MD simulation to enhance understanding of the resilience and interactions of the protein-ligand complex. All the results of these analyses are presented from Fig. 10-Figure 14.
3.7.1 RMSD Analysis
The stability and alterations in the structural features of a docked complex can be assessed by calculating the root mean square deviation (RMSD). The intended behavior of the protein-ligand complex is reliant upon the folding and shifting of the atoms within the overall protein structure. The RMSD analysis enabled us to understand the structural alterations that transpired in the side-chain, Cα, and backbone atoms of the proteins throughout the simulation period. Minimal fluctuations in the RMSD value are considered favorable indications of the docked complex's stability. If the protein-ligand complex shows consistent changes in the range of 1–3Å than apo-protein, it implies the system has slowly reached equilibrium, and fluctuations in this range are entirely acceptable [78]. But significant shifts above the aforementioned magnitude suggest that the protein undergoes a conformational change throughout the simulation. In our study, the RMSD of carbon alpha atoms in apo-proteins was used as a reference frame for the RMSD calculations. Figure 10 (A and B) show how the RMSD values changed over time for the C-alpha atoms of the ligand-bound proteins and reference proteins. The RMSD value of EGFR apo-protein (5ZWJ) fluctuates between 1.77Å and 3.52Å until 55 ns, after which, with some minor fluctuations, it stabilizes with a mean RMSD of 3.18Å until 175 ns. Then the value shows a small downward peak until the end of the simulation. The RMSD value of the EGFR protein with compound gallocatechin is lower than that of the reference apo-protein, which implies the ligand stabilizes protein structure by interacting with favorable residues. The highest and lowest RMSD values for the protein-ligand complex are 3.58Å and 1.41Å, with an average RMSD of 2.55Å, while the reference protein shows values ranging between 4.30Å and 1.67Å with a mean value of 3.01Å. The EGFR-epigallocatechin complex shows almost the same RMSD values as the reference protein until 25 ns, after which the RMSD of the complex gradually increases up to the end of the simulation. Following the peaks of 5.45Å, 5.56Å, and 5.59Å at 73 ns, 79 ns, and 82 ns, the complex reached its maximum RMSD of 5.69Å at 104 ns. The complex exhibits mild variations from 40 ns to 140 ns but then stabilizes until 200 ns with a small fluctuation at 183 ns. The apo-protein shows RMSD values of 2.94Å, 2.84Å, 2.46Å, and 3.19Å at time points 73 ns, 79 ns, 82 ns, and 104 ns, indicating that the RMSD change of the complex falls within the permissible range of 1–3Å. The EGFR protein with compound hesperetin shows some fluctuations from the starting point of the simulation until 25 ns. The complex exhibits a slight rise in RMSD value from 26 ns to 55 ns, and then the value decreased compared to the reference protein until the end of the simulation, with a modest increase observed at points 98 ns to 100 ns. The mean RMSD for this complex is 2.92Å, and the maximum and minimum values are 3.65Å and 2.08Å at 30 ns and 81 ns, respectively. The complex EGFR-gossypetin has a lower RMSD than apo-protein until 25 ns. Subsequently, the complex exhibits a consistent and enduring RMSD peak from 26 ns to 135 ns. The complex had some oscillations in the value from 136 ns to the end of the simulation. The average RMSD of this complex is 3.38Å. At time point 1.38 ns, the complex has a higher value of 4.41Å, while at time point 1.2 ns, it has a lower value of 1.67Å. The RMSD values of the EGFR-quercetagetin complex show a progressive increase during the simulation, though they remain within the allowed range at all times. The maximum value of RMSD for the complex is 5.92Å at 167 ns, while the reference protein's RMSD at this point is 3.54Å. The lowest RMSD count for the complex at time point 1 ns is 1.55Å, while the reference protein's RMSD is 1.67Å. The complex has a mean RMSD of 4.27Å.
The RMSD value of this apo-protein is less than 3Å with all the compounds except compound quercetagetin, which means the protein remains structurally stable with the compounds. There are small fluctuations in the VEGFR-2 apo-protein (4ASD) up to 20 ns, after which it reaches a stable state. The protein has a bit higher RMSD value than the adjacent time frame between 110 and 175 ns, and it also reaches its maximum RMSD value at 2.76Å during this time frame. The lowest value of the VEGFR-2 and gallocatechin complex is 1.15Å, while its highest value is 3.31Å, observed at 4 ns and 115 ns, respectively. The mean RMSD of the complex is 2.55Å. The RMSD of the VEGFR-2 and compound epigallocatechin complex slowly increases over the course of the simulation compared to the reference protein. The mean RMSD of the complex is 2.67Å, while the mean RMSD of the VEGFR-2 apo protein is 2.07Å, but the RMSD value of the complex never exceeds the acceptable range of the RMSD difference. During 93 ns, the highest difference was observed between the complex and apo-protein, while the value was 2.01Å for apo and 3.41Å for complex. The VEGFR-2 and hesperetin complex remain stable with slight variations until 110 ns, then show some upper peaks between 120 ns and 150 ns. The RMSD value of the complex is approximately equal to that of the apo-protein for the remaining period of the simulation. The average RMSD for the complex is 2.45Å, while the maximum value is 3.15Å at 132 ns and the minimum value is 1.3Å at 1 ns. There is approximately an equal RMSD until 40 ns compared to the apo-protein for the complex VEGFR-2-gossypetin. Subsequently, the complex attains stability and exhibits a slightly higher RMSD value compared to the reference protein. The complex has the highest value of 3.00Å at 110 ns and the lowest value of 1.23Å at 1 ns, with a mean RMSD of 2.46Å. The protein with compound quercetagetin exhibits instability until 40 ns, as some noticeable peaks were observed in this time frame. Then the complex's structure remained stable until 80 ns, after which the value slightly decreased from the previous time frame until 160 ns. The graph showed a small increase again after 160 ns. The total RMSD of the complex is marginally greater than that of the reference protein. A peak of 3.83Å is noticed for the complex at 183 ns, but the reference protein's RMSD was 2.19Å at the same time. A second peak of 3.81Å is observed at 14 ns, while the reference protein's value is 1.59Å. Accordingly, in the complex, the next higher peaks are 3.79Å, 3.75Å, and 3.72Å at 36 ns, 198 ns, and 21 ns, respectively. The RMSD of the apo-protein at 36 ns, 198 ns, and 21 ns are 1.82Å, 2.41Å, and 1.81Å, which means the RMSD changes never exceed the acceptable range. In our study, RMSD analysis showed that none of the complexes has a RMSD value that is significantly larger than the reference proteins and higher than the acceptance range. This indicates that the docked structures remain stable and the ligands are effectively interacting with the protein molecules. In addition, from the RMSD values of the protein-ligand complexes, it is apparent that the ligand's binding to the proteins does not significantly change the conformational structure of the proteins.
3.7.2 RMSF Analysis
The root mean square fluctuation (RMSF) analysis is an effective approach for identifying fluctuations of the specific amino acid residues in the protein chain, notably those in the active region. The RMSF represents the mean changes observed in every single amino acid residue of the protein throughout the simulation period. The analysis of RMSF also aids in comprehending the impact of amino acid fluctuations on the stability of protein-ligand complexes and identifying the malleable residues in selected proteins. Therefore, in our study, we calculated the RMSF statistics to evaluate how the ligand attachment affects the internal kinetics of the selected protein's amino acid residues throughout the 200 ns time frame. Figure 10 (C and D) displays the residue-wise RMSF values of EGFR and VEGFR-2, both proteins, in their unbound and bound states with the selected compounds. Typically, the N and C-terminal regions of the protein exhibit higher fluctuations compared to other regions of the protein, and the secondary structure constituents, such as beta strands and alpha helices, are normally more firm than the loosely organized portion (loop region) of the protein. Consequently, they show fewer fluctuations compared to loop regions. Typically, large peaks in an RMSF plot represent targeted protein residues with the highest level of variation. The RMSF value of the N- and C-terminal regions of VEGFR-2 is slightly high with all the compounds, though EGFR does not show any fluctuations in these regions with any compound. The EGFR protein, with few exceptions, shows fewer fluctuations compared to the reference protein during the whole simulation period, which indicates the ligand binding efficiently stabilizes the protein. The protein EGFR does not show any noticeable fluctuation peaks with the compounds gallocatechin, hesperetin, and gossypetin compared to the apo-protein 5ZWJ. With compound epigallocatechin, the EGFR protein shows some higher peaks from residue LYS_913 to GLU_931, and with compound quercetagetin, it shows some moderate fluctuations at residues VAL_876, PRO_877, and GLY_911 to ILE_926. However, these protein residues have not been involved in ligand interactions. More importantly, the RMSF value of the EGFR protein's mutant residues is lower than that of the apo-protein. This suggests that these compounds may be able to stop the abnormal activity and bring the protein back to its normal function, or they may make the protein more receptive to drug bindings, which could help overcome drug resistance by stabilizing the mutant residues. The VEGFR-2 protein with the compounds gallocatechin and epigallocate- chin shows some mild fluctuations at some points, including the N and C terminal regions, but the fluctuated residues do not interact with the ligands. The compounds hesperetin and gossypetin with the protein VEGFR-2 do not show any noticeable peak in the graph except an N and C terminal peak. The RMSF value of the VEGFR-2 protein with the compound quercetagetin has higher peaks in all of its residues than the reference protein. But the RMSF value of most of the residues never crosses the acceptable range. Apart from the N and C terminal's elevated peaks, this complex shows slightly higher peaks above the allowed range of RMSF at residues ALA_844, LYS_868, GLU_885, LEU_901, ILE_915, SER_925, and ILE_1034. However, most of these residues are from the loop region of the protein. The other protein residues maintain a constant acceptable range of RMSF values during the whole simulation period. Our primary structure analysis of VEGFR-2 protein reveals that the residues where VEGFR-2 protein shows fluctuations in complex with the compounds, none of these fluctuated residues are involved in the active site of the protein. If there is no fluctuation observed in the active site or the fluctuations in the active region are less that indicates the protein is structurally rigid with the compounds [79].
The RMSF analysis of the EGFR and VEGFR-2 proteins, along with the compounds gallocatechin, epigallocatechin, hesperetin, gossypetin, and quercetagetin revealed that the ligands mostly remained confined within the binding pockets and interacted with the amino acid residues of the active sites. Except at some points of the VEGFR-2-quercetagetin complex, both the proteins, along with all the compounds, never fluctuate above the acceptable range of 3Å at any point. Additionally, as a result of the minimal movement of protein residues with all the compounds over the simulation timeframe, the entire structure of proteins maintained a compact form in these complexes.
3.7.3 Protein–Ligand Interaction Analysis
The amino acid and atomic-level interactions analysis between protein and ligand provides valuable insights on the conformational robustness and correlation of the protein, hence enhancing the comprehension of simulations. The amino acid interactions analysis reveals which residues are most important for the protein-ligand interaction, while the atomic-level interaction analysis is crucial for accurately predicting the mode of attachment of the protein and ligands throughout the whole simulation period [80]. Therefore, intermolecular interactions between the selected compounds with the proteins were examined using the Simulation Interactions Diagram (SID) after the 200 nanoseconds simulation run, and outcomes are deciphered in Fig. 11 and Fig. 12. The interacted amino acid residues of the protein with their Interactive Fraction Value (IFV) are presented as a stacked bar chart in Fig. 11 (left side) for EGFR and Fig. 12 (left side) for VEGFR-2. The Y-axis of the stacked bar chart revealed interactive fraction values, while the X-axis pointed out the amino acid residues. In the stacked bar chart, IFVs 1.0 indicate the corresponding residue attached to the ligand for 100% of the simulation time, and values over 1.0 in the chart suggest that the residue interacts multiple times (over 100%) with the identical ligand subtype [30]. In addition, Fig. 11 (right side) and Fig. 12 (right side) disclose the atomic interactions of the ligands with EGFR and VEGFR-2 in 2D format that were retrieved from the frames generated during the 200 ns MD simulation of the protein-ligand complex. Interaction events that take place for over 30% of the simulation duration are displayed solely within these 2D illustrations. The amino acid residues MET_793 and ASP_855 of the EGFR protein in complex with the compounds gallocatechin and epigallocatechin maintain a stable interaction over 100% of simulation time through hydrogen bond and water bridge interactions. The IFVs for these residues are 1.75 and 1.28 with gallocatechin and 1.30 and 1.25 with epigallocatechin. The residues MET_790, ASP_800, and LEU_844 have an IFV of 0.55, 0.61, and 0.57 with gallocatechin, which indicates these residues maintain interaction over 50% of the simulation time. In the complex, residues MET_790 and LEU_844 are attached by a hydrophobic bond, and residue ASP_800 by a water bridge. The residues ASP_800 and LEU_844 in the EGFR-epigallocatechin complex maintain interaction over 80% and 50% of the simulation time, respectively, with an interactive fraction value of 0.80 and 0.50 by water bridge and hydrophobic bond. The residue MET_793 of the protein has an interactive fraction value of 1.17, and residues ASP_800 and ARG_841 have a value of 0.65 and 0.51 with the compound hesperetin. It means residue MET_793 maintains interaction over 100% and residues ASP_800 and ARG_841 maintain over 50% of the simulation time by hydrogen bond and water-bridge. The EGFR residues 743_ALA, 791_GLN, 793_MET, 841_ARG, and 855_ASP have an interaction value of 0.73, 0.86, 0.84, 0.62, and 1.19, respectively, with the compound gossypetin. Which reveals residue 855_ASP prolonged interaction over 100% and the rest maintain interaction over 50% of the simulation time. The EGFR residue 762_GLU maintains interaction with an interaction value of 1.16, and the 723_PHE and 750_ALA residues have IFVs of 0.51 and 0.54 in the complex with quercetagetin. The VEGFR-2 residues GLU_934 and TYP_996 maintained the interaction over 100% of the simulation time with the compound gallocatechin with IFVs of 1.98 and 1.01 by hydrogen, hydrophobic, and water bridge interactions. The residues ASP_1058 and GLU_1121 of the VEGFR-2 protein interact with epigallocatechin over 50% of the simulation time by hydrogen bond and water bridge interaction with IFVs of 0.56 and 0.66. In the complex VEGFR-2-hesperitin, residues GLU_917 and CYS_919 interact 60% and 90% of the simulation time by hydrogen bond and water bridge, respectively, with an IFV of 0.62 and 0.90. The VEGFR-2 protein's residues GLU_828, GLU_1114, ARG_1118, and GLU_1121 with the compound gossypetin maintain interaction for 50% of the simulation time with IFVs of 0.52, 0.58, 0.52, and 0.52, respectively. The residues GLU_1003 and ASP_1129 have an IFV of 1.56 and 1.27, which implies these residues maintain interaction over 100% of the simulation time with the compound quercetagetin by hydrogen, ionic, and water bridges. In addition, the complex residues ASP_832, TYR_938, LEU_1002, and TYR_1130 are attached by hydrogen, hydrophobic, and water bridges with IFVs of 0.77, 0.70, 0.65, and 0.60. According to the interaction analysis, it is apparent that hydrophobic, hydrogen, ionic, and water molecules participating water bridge interactions are the four main interactions that occur between our selected ligands and proteins. Most of the interactions involve active site residues of both proteins, and these residues maintain a constant interaction with the ligands to prolong a stable attachment with them.
3.7.4 Ligand Properties
We analyzed various properties of the ligands after the simulation to observe whether they are stably bound to the protein or had undergone any structural changes in their binding state. A number of criteria were analyzed to examine the structural details of the ligands, such as RMSD, molecular surface area (MolSA), radius of gyration (Rg), and polar surface area (PSA). Figure 13 and Fig. 14 illustrate the outcomes of the ligand properties analysis. All the ligands except quercetagetin have an RMSD value lower than 1.0Å with the EGFR protein, and none of them show any significant fluctuations during the whole simulation period, which is highly satisfactory. The RMSD values of the ligands are presented in Fig. 13 (A and B). From the figure, we can observe that the ligand quercetagetin has a maximum RMSD value of 1.47Å, and it exhibits some fluctuations until 120 ns, and then it gradually stabilizes for the remainder of the simulation. On the other hand, all of these compounds, when bound to the protein VEGFR-2, have an RMSD of less than 1.5Å, which is likewise satisfactory. The compound gallocatechin with VEGFR-2 protein has shown some fluctuations until 25 ns, and then it stabilized with a mean RMSD of 1.10Å until 135 ns. Then the RMSD shows a reduced peak with a mean value of 0.36Å. The maximum RMSD value for this compound is 1.8Å. The compound epigallocatechin has an average RMSD of 0.46Å and a maximum RMSD of 1.19Å when bound with the VEGFR-2 protein. Compound hesperetin initially exhibits a few oscillations before completely stabilizing with an average RMSD of 0.40Å throughout the simulation period. Both the compounds gossypetin and quercetagetin show mild fluctuations throughout the simulation period, with an average RMSD value of 1.10Å and 0.85Å. The highest RMSD values for these compounds are 1.44Å and 1.46Å. We also examined the stiffness of the compounds with the target proteins during the whole simulation period by analyzing their radius of gyration, as shown in Fig. 13 (C and D). The structural activity of proteins can be reassessed by analyzing the radius of gyration (Rg). It quantifies the distance that a molecule expands from its center of mass during the simulation. It reflects the root-mean-square distance of the atoms from the center of mass of a molecule [81]. A large difference between the lowest and highest points of Rg values suggests compounds are dissociating from the protein, while smaller differences indicate a stronger interaction between the protein and ligands [82].
The lowest and highest Rg values gallocatechin, epigallocatechin, hesperetin, gossypetin, and quercetagetin in complex with EGFR are 3.96Å and 4.14Å, 3.95Å and 4.10Å, 4.26Å and 4.44Å, 3.92Å and 4.08 Å, and 4.04Å and 4.24Å, respectively. However, the average Rg values for each of these complexes are 4.02Å, 4.02Å, 4.36Å, 4.00Å, and 4.17Å. In contrast, all the compounds have a mean Rg value of 3.85Å, 4.01Å, 4.34Å, 4.05Å, and 4.20Å with the VEGFR-2 protein. The lowest and highest Rg values with this protein are 3.6Å and 4.0Å for gallocatechin, 3.92Å and 4.08Å for epigallocatechin, 4.24Å and 4.40Å for hesperetin, 3.9Å and 4.1Å for gossypetin, and 4.08Å and 4.24Å for the compound quercetagetin. The Rg value differences for all the compounds with the EGFR protein are lower than 0.20Å. On the other hand, except for gallocatechin and gossypetin, the rest of the compounds have Rg value differences below 0.20Å. The Rg value difference for gallocatechin is 0.40Å and 0.30Å for gossypetin. The MolSA is equal to the Van der Waals surface area, calculated with a 1.4 Å probe radius. The integrity of the protein-ligand interaction is additionally related to MolSA. If the MolSA value is high, it suggests an unstable protein-ligand complex, while a low MolSA value indicates a more stable complex [83, 84]. The findings of the MolSA analysis in our study are depicted in Fig. 14 (A and B). All the compounds except hesperitin have MolSA values ranging from 255Ų to 265Ų with both the proteins EGFR and VEGFR-2. The hesperitin shows a MolSA value ranging from 270Ų to 277 Ų with both proteins. These ranges of MolSA values are completely acceptable. In addition, the MolSA in this study remained constant without significant variations throughout the whole simulation period. We also measured the polar surface area (PSA) to understand whether our compounds cross the blood-brain barrier (BBB). The solvent-accessible surface area of a molecule composed entirely of oxygen and nitrogen atoms is known as PSA [85]. A compound with a PSA value greater than 40 Ų or less than 90 Ų has the potential to penetrate the blood-brain barrier [86]. PSA is also a descriptor that provides details on the potential polar interactions, permeability, and solubility of a molecule's. Furthermore, the significance of PSA lies in the fact that highly polar molecules have trouble passing across biological membranes. The selected compounds displayed satisfactory PSA averages, as illustrated in Fig. 14 (C and D). The average PSA of gallocatechin, epigallocatechin, hesperetin, gossypetin, and quercetagetin is 293.77Ų, 288.56Ų, 201.25Ų, 311.38Ų, and 309.65Ų, respectively. There are no major fluctuations observed in the PSA graph throughout the simulation period.
Our ligand properties analysis indicates the ligands are stably attached to the protein, as there are no large fluctuations observed in the ligand RMSD graph. The significant variation in the RMSD graph implies the ligand has shifted away from the protein during simulation. However, in our study, the ligands exhibit uniform behavior throughout the simulation, also indicating the system's complicated stability. In addition, Rg values suggest that the binding of ligand molecules does not significantly alter the structural state of the protein's binding region. The MolSA values further supported the stability of our complexes. All the compounds also exhibited a significant PSA value in the presence of both proteins. Based on our ligand properties analysis, all the selected compounds exhibited ideal RMSD, Rg, MolSA, and PSA values, making them strong prospects for therapeutic development.
3.8 Binding Free Energy Calculation (MM-GBSA)
MM-GBSA is broadly used and reliable method of predicting the binding free energy immediately following a simulation. It takes into account protein flexibility, polarizability, entropy, and solvation, which are sometimes not considered in docking methods [87]. Hence, the MM-GBSA calculations exhibit more precision compared to the majority of scoring systems employed in molecular docking and are reliable enough for ranking drugs based on their binding affinities and inhibitory activity. Precisely calculating the free energy of binding is imperative in biomolecular research since it controls various molecular events, including chemical reactions, protein folding, molecular recognition, and organization [88]. Recent research demonstrated that incorporating physiologically relevant energy components, such as solvation energy and surface accessibility area, into a molecular mechanical force field can enhance the accuracy of predicting the binding energy of ligands [89]. As a result, the use of MM-GBSA as an "end-point" technique in drug discovery protocols has garnered significant attention [54]. Hence, in the final phase of our computational workflow, the implementation of the MM-GBSA calculation is essential for precisely assessing the affinity of ligand binding and reproducing the results obtained from the molecular docking and dynamics simulation. The MM-GBSA calculations establish a correlation between the stability of the ligand within the binding pocket and a more negative binding free energy (ΔGbind), which signifies stronger binding and a more pronounced correlation with the inhibitory ability of drugs [87]. The post-simulation MM-GBSA analysis was performed on every 100th frame (ranging from frame 0 to 1001). This resulted in a total of 10 conformations for each simulated complex, each conformation representing every 20 ns of the whole simulation timeframe. The binding energy statistics of MM-GBSA, as illustrated in Fig. 15, are represented in the form of a bar chart diagram. The binding free energy value is presented on the left Y-axis of the graph. On the X-axis, the graph displays the individual energy components. This diagram illustrates the mean cumulative contributions of different types of interactions, namely van der Waals, covalent, Generalized Born electrostatic solvation, columbic, lipophilic, and hydrogen bonding. The cited interactions have been directly observed to exert a significant impact on the binding free energy, denoted as ΔGbind.
We observed that the compounds epigallocatechin and gallocatechin have the highest average binding energy (ΔGbind) of -54.002 and − 52.28 kcal/mol, respectively, in the case of EGFR. The findings indicated that the key energy components contributing to the stability of the simulated epigallocatechin-EGFR and gallocatechin-EGFR complexes were ΔGbindvdW, ΔGbindCoulomb, and ΔGbindLipo, which had the greatest impact on ΔGbind. However, in contrast, these compounds exhibit the lowest average binding energy (ΔGbind) of -25.41 and − 25.546 kcal/mol, respectively, for VEGFR-2. The energy components GbindSolvGb, GbindCovalent, and GbindHbond were observed to demonstrate unfavorable energy contributions for the epigallocatechin and gallocatechin in both scenarios of forming complexes with EGFR and VEGFR-2. Hesperetin's binding-free energy demonstrated strong consistency with both EGFR and VEGFR-2, as in the hesperetin-EGFR complex, the ΔGbind value was − 52 kcal/mol, while in the hesperetin-VEGFR-2 complex, the ΔGbind score was − 41.017 kcal/mol. The compound hesperetin is perhaps the most ideal among the five candidates since it has an optimum binding-free energy consistency with both EGFR and VEGFR-2. The ΔGbind values of hesperetin in the hesperetin-EGFR complex and hesperetin-VEGFR-2 complex indicate that hesperetin exhibits strong stability in the binding sites of both proteins and can inhibit both proteins simultaneously. The hesperetin-EGFR complex's optimum binding energy correlated with the higher energy values reported for GbindvdW (-48.17 kcal/mol), GbindCoulomb (-12.8 kcal/mol), and GbindLipo (-11.58 kcal/mol). Similarly, these energy component’s relative contributions to the binding energies in the hesperetin-VEGFR-2 complex remained fairly consistent. In this instance, the computed values of ΔGbindvdW, ΔGbindCoulomb, and ΔGbindLipo are − 31.42 kcal/mol, -12.04 kcal/mol, and − 10.2 kcal/mol, respectively.
The calculated overall binding free energy of the gossypetin-EGFR complex is -48.36 kcal/mol, whereas the binding free energy of the gossypetin-VEGFR-2 complex is -32.36 kcal/mol, positioning it as the second most promising candidate. Quercetagetin demonstrates a binding energy of -43.31 kcal/mol, which is slightly lower but comparable to gossypetin when it interacts with EGFR to form a complex. However, it demonstrates a marginally stronger attraction to VEGFR-2 (35.01 kcal/mol) in comparison to gossypetin. Furthermore, in every instance, the van der Waals (VdW) energy seems to make the most significant contribution to the binding energy, followed by coulomb and lipophilic interactions. As per the binding free energy values, the compounds can be ranked in the following order:
Hesperetin > Gossypetin > Quercetagetin > Epigallocatechin > Gallocatechin.
The findings appear to indicate that these drugs have the potential to target the EGFR and VEGFR-2 pathways simultaneously rather than individually, indicating the possibility of a synergistic effect.