Structure-based Assessment of Homologous Analogues of Natural products: A computational approach to predict the therapeutic effects of natural products

The present study describes a novel strategy to screen natural products (NPs) for their therapeutic effects with the predicted mechanism of action. The method entitled 'Structure-based Assessment of Homologous Analogues of Natural products-SAHANA' follows the comparison of NPs against prescribed synthetic chemical drugs to deduce activity cliffs and core fragments, based on the molecular properties and 2-dimensional structural similarities. The method was applied to predict the biological effect of the identied NPs as antidiabetic molecules. Selected NPs were assessed for their pharmacokinetic and pharmacodynamics properties. The biological interactions and structural stability of the bound structures were evaluated using molecular docking and molecular dynamics simulations. The study yielded NPs with signicant structural similarities to prescribed drugs. Further, their binding interactions stabilized the macromolecular structure. The results envisage a strong indication that the natural products can produce therapeutic effects eciently if administered individually. The results also encourage using the current screening strategy to identify competent natural product drugs against any disease condition ad libitum. at a time step of 2fs trajectory coordinates at intervals. MD trajectories were analyzed gmx energy, gmx rms, gmx rmsf, gmx gyrate, gm xhbond, gmx do_dssp, gmx covar, gmx and gmx sasa modules of GROMACS with interaction energies of electrostatic


Introduction
Searching and identifying chemical compounds as therapeutic molecules are considered an important aspect of modern drug design. It is based on the assumption that structurally similar molecules frequently exhibit similar functions. This assumption has fuelled many drug-designing programs to develop many two-dimensional and three-dimensional similarity-based approaches, which are highly e cient [1]. However, an effective strategy should involve a close coupling of search algorithms and database implementation [2]. Several strategies have been proposed in this avenue, emphasizing core-structure and substructure search across the databases. This includes searching compounds by converting structures to unique string sequences like WLN (Wiswesser Line Notation), ROSDAL (Representation of Organic Structures Description Arranged Linearly), or SMILES (Simpli ed Molecular Input Line Entry System) and perform a simple string search [3]. The modern search approaches employ advanced computational methods and data structure modules to develop new search algorithms [4][5][6]. However, as these search algorithms give preference to structures and not functions, it becomes necessary to lter molecules with predictable biological effects.
Molecular similarities are generally considered in optimizing the potency and druglikeness of lead compounds to derive structure-activity relationships [1]. The mathematical models used to predict pharmacokinetic (PK) and pharmacodynamic (PD) properties can describe the drug-target association and possible mechanisms that evoke a physiological response. Based on these characters the drugs' a nity and e cacy are predicted. The therapeutic effect is generally associated with a speci c target playing a critical role in the disease mechanism [7]. Validating such links and targeting them to interfere with biological processes results in controlling or managing several diseases. The underlying mechanism of action of such target molecules would be the inhibition by the drug molecule (reversible or irreversible), as an agonist or antagonist for a receptor molecule, intercalator (binder), modi er (alkylating agent) or substrate mimicking for a DNA molecule, membrane blocker or opener for cell membrane ion channels or uptake inhibitors for a transporter molecules.
Structure resemblance as a basis to ascertain function: The structural arrangements attribute to the biological activity of any molecule. If two molecules have a similar structure, they will most probably have a similar biological effect [1,8,9]. Though some converse effects are being reported [10,11], computational chemists successfully exploited this principle to construct diverse compound libraries and select compounds for high-throughput screening experiments [8]. Such similarities have driven the synthesis of many small molecule drugs that share maximum identity with natural products [12].
Unlike synthetic compounds, which usually cover a small range of chemical diversities, natural compounds exhibit more versatile functionality. As they are synthesized from enzyme-catalyzed reactions, the asymmetric factors play a critical role in imparting their pharmacological effect. Hence, it is possible to predict their biological behavior by considering the molecular ngerprints compared with structurally similar synthetic drugs [9,13]. As the action mechanism of chemically synthesized drugs is well de ned, their structurally similar natural products are also expected to exert similar effects.
Overview of the methodology: 'Structure-based Assessment of Homologous Analogues of Natural products-SAHANA' describes a novel approach to screen natural products by considering structural similarities and comparing with synthetic chemical drugs, emphasizing their therapeutic potentials. The comparison was made based on the molecular composition and two-dimensional structural similarities between natural and synthetic drugs. The resulting molecule was assessed for possible therapeutic effect and a probable mechanism of action using different in-silico analyses. An overview of the methodology followed in SAHANA is presented in gure 1.

Data collection and library construction:
Natural product activity and species source database (NPASS) offers an extensive library of natural products with experimental-derived quantitative activity data (http://bidd.group/NPASS/index.php) [14]. For the comparison, SMILES (Simpli ed molecular-input line-entry system) data library was constructed using NPASS database consisting of 26,609 natural product structures alongside a synthetic drug library composed of 85 prescribed chemical drugs selected using Drugs.com server (https://www.drugs.com/condition/diabetes-mellitus-type-ii.html). The SMILES were converted to structure data using Osiris DataWarrior software [15] and used for structure comparison.
2.2 Implementation of the method: a. Pharmacophore based comparison of natural products with their synthetic drugs counterparts: The non-redundant natural product library was compared against the synthetic chemical drug library consisted of prescribed chemical drugs for type-2 diabetes (T2D) conditions. The molecular properties were used to assign the druglikeness of each molecule considering their molecular weight, cLogP, hydrogen atom donors, hydrogen atom acceptors, and rotatable hydrogen bonds. To nd structurally similar compounds rather than compounds sharing a common sub-structure, core fragment-based SAR analysis was performed by considering the most central ring structure. The similarities between the fragments were assessed based on the number of fragments that both molecules have in common, divided by the number of fragments being found in any of the two structures [15]. The structures were further analyzed for structural scaffolds based on plane ring system to determine the substructures and to de ne the similarity cut-off during the structure comparison. The molecular properties, activity cliff, core fragments, and structural scaffolds were predicted using Osiris DataWarrior V.4.4.3 software [15].
b. Similarity score cut-off limit: Natural products exist in many of their stable analogs forms in nature. Even with minor structural variations, the analogous forms of natural products can exert unique biological effects [16]. Therefore, there was a need to set an appropriate limit to lter natural products and their analogous structures. Due to their complex structures, at higher cut-off limits, such derivatives are expected to exclude, while at lower cut-off limits, the analogous structures become inclusive (Table S1). By considering such variations, a similarity cut-off limit of 60% was xed for the comparison.

Method validation:
a. State 1: In-silico pharmacokinetics, pharmacodynamic and toxicology pro ling: The molecular complexity is one of the major characteristics of natural products. The stereo-speci city, chirality, and cis/ trans-con guration contributing to the structural properties of the natural product are very speci c. The presence of aromatic rings, larger macrocyclic aliphatic rings, lower nitrogen content, and increased oxygen content contribute majorly to their structural complexity [17]. The oral bioavailability of natural products can be predicted by considering such structural properties. In pharmaceutical research, Rule of ve or Lipinski rule-of-ve (RO5) has become one of the widely used computational approaches to estimate solubility and permeability of new drug candidates [18]. Although these parameters cannot be strictly applied to lter natural products, they can be used as a 'rule of thumb' to describe the molecular properties necessary to screen a candidate drug's pharmacokinetic (PK) and pharmacodynamics (PD) [19,20].
b. Stage 2: High throughput virtual screening to predict Molecular interactions: The selected molecules were subjected to high throughput virtual screening (HTVS) using automated docking to identify their binding a nity with the selected targets [23,24]. The information on the action mechanism of synthetic drugs was collected from Inxight: Drugs database of the NIH-National Center for Advancing Translational Sciences (NCATS) server (https://drugs.ncats.io/). We used automated docking as a screening method to predict the probable interactions of selected natural products with appropriate target proteins. The targets were selected based on the action mechanism of the most structurally similar synthetic drug (Table S2). The Protein structure les were obtained from the protein data bank and edited by removing heteroatoms and adding Kollmann charges as part of target preparation. For Empagli ozin and Luseogli ozin and their structurally similar natural products, Sodium-glucose cotransporter 2 (SGLT2) protein with PDB ID 2XQ2 was used for interaction studies. Similarly, for Prednisolone and its structurally similar natural products, Glucocorticoid receptor agonist with PDB ID 1M2Z was used. For each target, residues forming the binding site were identi ed using the PDBsum server. The grid box was set around the residues forming a binding pocket, and the Broyden-Fletcher-Goldfarb-Shanno algorithm implemented in the AutoDockVina was employed to study appropriate binding modes of the ligand in different conformations [25]. For the ligand molecules, all the torsions were allowed to rotate during docking [26]. The in-silico studies were performed on a local machine equipped with AMD Ryzen 5 six-core 3.4 GHz processor, 8GB graphics, and 16 GB RAM with Microsoft Windows 10 and Ubuntu 16.04 LTS dual boot operating systems.
c. Stage 3: United-atom molecular dynamic simulation studies to predict the protein stability: United-atom molecular dynamics (MD) simulation studies analyze the physical movement of atoms and molecules. To assess the structural stability upon ligand binding [27,28], MD simulations were run for a time scale of 20 ns. Natural product-bound targets with the lowest binding energy were selected from each docked group and subjected to MD simulations. For simulation studies, GROMOS96 54a7 [29] force eld implemented in GROMACS package (Abraham et al., 2015), version 2018 was employed. Brie y, the system was solvated using the simple point charge (SPC) model [30] and neutralized using sodium and chloride ions. Energy minimization was done using the steepest descent method followed by temperature coupling (300K) using V-rescale thermostat [ The 'G' term can be further decomposed into the following components- The comparison within the T2D synthetic chemical drugs library, performed to support the similar actions, supported their similarities in mechanisms of action (Table S2). The comparison of the natural product library with the chemical drug library yielded 13,609 structures. The structures were further ltered based on their availability in edible sources, which yielded 166 structures [38,39]. The most similar structures from each chemical drug group were selected for further studies ( Table 1). The druglikeness estimation indicated that out of 15 molecules identi ed, 09 molecules with positive scores demonstrated druglikeness, while 14 molecules found to penetrate the human intestine, 09 molecules penetrating the blood-brain barrier, and none of them being the Cytochromes P450 substrate indicating the high possibility of their bioavailability (Table 2A). The result of the in-silico PK/PD studies showed that, out of 15 molecules, 02 were predicted to be mutagenic while none of the molecules were tumorigenic, without any reproductive effects or irritancy (Table 2B).

High throughput virtual screening to predict Molecular interactions:
The in-silico molecular interaction studies predicted the most effective natural product to bind to the appropriate target [23,24,40]. The binding was compared with the structurally similar synthetic drug counterparts whose targets were used for the interaction studies. The study yielded natural products effectively binding to their respective targets (Table 3). Naringin, Kaempferol-3-neohesperidoside, Ellagitannin, and Gallotannin, showed interaction energies ranging between -11.9 to -11.0 kcal/mol. Isoquercitrin, Rutin, Hesperidin, Procyanidin, and Phlorizin showed greater interaction energies with SGLT2 ranging between -12.4 to -1.0 kcal/mol. However, Ursolic acid, Oleanolic acid, Gymnemic acid, Beta-sitosterol, and Stigmasterol showed prominent interactions with the Glucocorticoid receptor, their chemical drug counterpart; prednisolone showed higher interaction energy of -1.0 kcal/mol. The effectiveness of binding was studied by considering the protein stability using molecular dynamics simulation studies.

United-atom molecular dynamic simulation studies to predict protein stability:
In the present study, MD simulations were performed to con rm the accuracy of binding resulted from docking studies. MD simulation results displayed stable conformational changes acquired by the target proteins upon ligand binding ( g 4). The MD trajectory analysis revealed that Naringin and Isoquercitrin imposed fewer RMS deviations and RMS uctuations than their structurally similar SGLT2 inhibitors; Empagli ozin and Luseogli ozin. Similarly, Ursolic acid displayed fewer RMS deviations and RMS uctuations compared to glucocorticoid receptor inhibitor; Prednisolone. The binding free energy calculated using the g_mmpbsa module revealed better interactions between selected natural products with their respective targets than their synthetic drug counterparts (Table 4).

Discussion And Conclusion
The present research describes a novel method to screen natural products by considering their molecular composition and 2-dimensional structural arrangements, followed by their effectiveness against a speci c disease condition. The method relies on the central foundation of medicinal chemistry that structurally similar molecules will have similar biological effects [1,8]. As an initial part of the study, the natural products were compared against antidiabetic synthetic chemical drug library based on their molecular and 2-dimensional structural similarities. The identi ed natural products were checked for druglikeness, pharmacokinetics, and pharmacodynamics properties by comparing them against a large group of already reported datasets [19,21]. The biological interactions involved in the action mechanism of synthetic drugs were extended to their structurally similar natural products. The interaction studies described the binding interactions of the selected molecules against suitable drug targets. The effectiveness of binding was studied using MD simulation, reporting the most effective drug-like natural product to treat T2D conditions. Based on the molecular interactions, and target's structural stability induced by natural products, it is convincing to state that the selected natural products will exert effects similar to their synthetic drug counterparts.
Several approaches have proposed chemical compounds search and identi cation methods based on conventional string sequence search to modern data structure module-based search algorithms [3][4][5][6]. The similarity measurements generally consider either structural representations involving physiochemical properties, topological indices, molecular graphs, pharmacophore features, molecular shapes, molecular elds, or quantitative measures involving Tanimoto coe cient, Dice index, cosine coe cient, Euclidean distance, Tversky index [1]. The 3-dimensional con rmation-based searching methods include pharmacophore modeling, shape similarity, molecular eld-based methods, 3D ngerprints to compare chemical compounds. Popular chemical databases like PubChem uses percent similarity measures employing Tanimoto equation [41] and a dictionary-based ngerprint, analogous to the Molecular ACCess System structure-based keys (MACCS) [42]. Tools like SwissTargetPredictior predicts the targets for bioactive molecules based on a combination of 2D and 3D similarity measures but fails to predict the biological effects if the prediction accuracy is signi cantly lower for molecules with unknown bioactivity [9].
The OSIRIS DataWarrior program, employed in the current study to assess molecular similarity, uses chemical similarity-based methods that rely on substructure fragments to biological similarity considering 3D-geometry and binding [15]. The current method uses these calculations uniquely to screens natural products against speci c disease conditions by targeting a drug target. Unlike SwissTargetPredictior, the present method can be employed even for molecules whose bioactivity is unknown. Further, it proceeds for bioactivity assessment based on molecular interactions and structural stability using molecular docking and MD simulations.

Limitations And Other Considerations
Our ndings encourage using the current strategy to screen natural products for any disease condition ad libitum. The method can also de ne the new roles of natural products with therapeutic bene ts, reducing the time and cost of lead discovery and validation. The methodology described in SAHANA relies entirely on computational calculations. Though the results are supported by statistical accuracy, emphasis must be given to experimental validations for precise action mechanisms. Another limitation of the current method is that it cannot be applied to conditions that do not have any validated action mechanisms for their prescribed drugs. As the prediction relies on the structural similarities between natural products and the available drugs, the information on well-de ned action mechanisms is a prerequisite for SAHANA. Nevertheless, the prediction made using SAHANA can act as a 'rule of thumb' in screening a large dataset of natural products against any disease condition. Availability of data and materials: All the data used during the current study are available from the corresponding author on reasonable request.
Competing interests: The authors declare that they have no con icts of interest.
Author contributions: ARSJ: designed and conceived the study, performed the research and wrote the manuscript. NPS: participated in the results discussion and technical support. Both the authors read and approved the nal manuscript.