Structure-based Assessment of Homologous Analogues of NAtural products -SAHANA: A novel approach to screen bioactive compounds in comparison with their synthetic drug counterparts

Background The present study describes a novel strategy to screen natural products (NPs) for their therapeutic effects with predicted most-probable mode of action. The method entitled ‘Structure-based Assessment of Homologous Analogues of NAtural products-SAHANA’ follows a strategy to compare natural products against synthetic drugs based on their molecular properties and 2-dimensional structural similarities. The method is based on a well-established hypothesis that, the molecules with similar structural properties will have similar functions. Methods The method was validated by implementing it for the screening of NPs against different disease conditions. The biological effect of the identied NPs were evaluated based on their binding anity with the target of their synthetic drug counterpart, prior to which, their in-silico pharmacokinetic and pharmacodynamics properties were assessed. The stability of binding was validated using molecular dynamics simulation studies. Results The study yielded NPs with signicant structural similarities to synthetic drugs and binding interactions stabilizing the macromolecular structures. Conclusions The results envisage a strong indication that the natural product drugs can work in a manner similar to that of synthetic drugs if administered individually. Further, the results encourage the application of the current strategy to screen competent drugs against any disease condition ad libitum. calculated from the MD trajectories of natural product and synthetic drug bound HIV-1 protease (I50V isolate), Inuenza virus heamagglutinin, SARS-CoV NSP 12 polymerase and HIV-1 protease (A02 isolate) and were compared with their respective unbound structures. Lesser RMS deviations were observed in HIV-1 protease (I50V isolate) after the binding of Hexahydropyrrolo Derivative, in SARS-CoV NSP 12 polymerase after the binding of HydroxyManzamin_A and HIV-1 protease (A02 isolate) after the binding of Bionectin_B compared to their respective chemical drug counterparts. RMS deviations was found to be lower in Arbidol bound Inuenza virus heamagglutinin compared to natural product Phellibaurin_A bound structure. Lesser RMS uctuations was observed in the natural product bound structures of HIV-1 protease, Inuenza virus heamagglutinin and HIV-1 protease compared to their respective chemical drug bound structures. The binding free energy calculations performed using g_mmpba module displayed better binding of natural products with their respective target proteins compared to respective chemical drug counterparts.

candidate can be predicted using mathematical models. These models describe the association of a drug with the target and the possible mechanism the drug exerts to evoke a physiological response. Based on these characters the drug's a nity and e cacy are predicted.
The therapeutic effect of a drug is generally associated with a target molecule, playing a critical role in the disease mechanism [7]. Any malfunction in these links results in disease. Validating such links and targeting them to interfere with biological processes has resulted in controlling or managing several disease conditions. The underlying mechanism of action of such target molecules would be the inhibition by a small molecules (reversible or irreversible), as an agonist or antagonist for a receptor molecule, intercalator (binder), modi er (alkylating agent) or substrate mimicking for a DNA molecule, membrane blocker or opener for cell membrane ion channels or uptake inhibitors for a transporter molecule.
Structure resemblance as a basis to ascertain function: The biological activity of any molecule is attributed by its structural arrangements. If two molecules have similar structure then they most probably will have similar biological effect [1,8,9]. Though there are some converse effects being reported [10,11], the computational chemists are successful in exploiting this principle for the construction of diverse compound libraries and selection of compounds for highthroughput screening experiments [8]. A comparison describing the structural similarities between thymidine, a naturally occurring nucleotide base and zidovudine, a synthetic drug used to treat HIV patients is given in gure 1. Structurally both the molecules share >93% identity. Such similarities has driven the synthesis of many anti-HIV drugs, which share maximum identity with the nucleotide bases [12].
Unlike synthetic compounds, which usually covers a small range of the chemical diversities, natural compounds exhibit more versatile functionality, even with their structural complexity. As they are synthesized from speci c enzyme-catalysed reactions, the asymmetric factors play critical role in imparting their pharmacological effect. Hence, by considering the molecular ngerprint of their pharmacophore it is possible to predict their biological behaviour by comparing with structurally similar synthetic drug [9,13]. In this avenue, the present study proceeds to identify the therapeutic effects of natural products against speci c disease conditions by comparing against synthetic small molecule drugs based on molecular properties and 2-dimensional structural similarities. As the chemically synthesised drugs exert their biological effects precisely and their action mechanism is well de ned, their structurally similar natural products are also expected to exert similar effect. The selected molecules were subjected to different in-silico studies to validate the proposed methodology.
Overview of SAHANA: The present protocol details an approach to screen natural products by comparing structural similarities with synthetic drug counterparts giving emphasis to their therapeutic potentials. The method follows the comparison of NPs with synthetic drugs with known mechanism of action and proposes the action mechanism for natural products, even if their bioactivity is not known. The comparison was made based on the molecular composition and two-dimensional structural similarities between the natural products and the synthetic drugs. As results demonstrated a signi cant level of similarities in terms of structure, the study was continued to explore the possibility that identi ed NPs to demonstrate similar therapeutic effect. The selected molecules showed remarkable binding interactions with the targets of their synthetic drug counterparts. The method was applied to screen investigational drugs currently being used to treat Type 2 diabetes (T2D) and COVID-19 patients. Our ndings encourages the application of this strategy for any other disease condition ad libitum. The method can also be used to de ne new role of natural products with therapeutic bene ts reducing the time and cost of lead discovery and validation. However, emphasis has to be given for experimental validations to ascertain effects on the action mechanisms proposed based on this strategy. An overview of methodology followed in SAHANA is presented in gure 2.
Natural products and synthetic drug library construction: Natural product activity and species source database (NPASS) offers a large library of natural products with experimental-derived quantitative activity data [14]. For T2D drug comparison, based on the literature survey, separate libraries consisting of 166 natural product structures and 84 synthetic drugs being prescribed for T2D condition were constructed. The study was extended to a larger data set by constructing a library of 26,311 structures of natural products derived from their SMILES data (Simpli ed molecular-input line-entry system) collected from NPASS database. The natural product library was compared by further categorizing as avans (339), avones (193) and iso avonoids (457) and rest of the molecules as a general group. For comparison against COVID 19 conditions the small molecule synthetic drugs were categorised into molecules present in Pubchem Covid19 portal (306) and molecules present at different stages of clinical trials (138) (As on 31 st August 2020). Further, the study was also extended for the comparison between most promising investigational drugs, which included Remdesivir, Arbidol, Lopinavir and Ritonavir molecules from which, top 10 results were selected for in-silico PK/PD analysis and HTVS studies. To support our study, both T2D prescribed drug library and COVID 19 drug libraries were compared internally to justify the similar biological activity among structurally similar molecules. All the libraries were constructed using Osiris DataWarrior software in a local machine equipped with AMD Ryzen 5 six-core 3.4 GHz processor, 8GB graphics and 16 GB RAM with Microsoft Windows 10 and Ubuntu 16.04 LTS dual boot operating systems.

Implementation of the method:
Pharmacophore based comparison of natural products with their synthetic drugs counterparts: The non-redundant natural product libraries were compared against their structurally similar small molecule synthetic drugs currently being prescribed for T2D and being prescribed/studied to treat COVID 19 infection. The molecular properties were used to assign the druglikeness of each molecule by considering their molecular weight, cLogP, hydrogen atom donors, hydrogen atom acceptors and rotatable hydrogen bonds. Based on these molecular properties, activity cliff of each structure was derived. To nd structurally similar compounds rather than compounds sharing a speci c sub-structure, core fragment based SAR analysis was performed by considering the most central ring structure. The similarities between two fragments were assessed based on the number of fragments that both molecules have in common divided by the number of fragments being found in any of the two structures [17]. The structures were further analysed for structural scaffolds based on plane ring system to determine the substructures and to de ne the similarity cut-off during the structure comparison. To derive molecular properties, activity cliff, core fragments and structural scaffolds, Osiris DataWarrior V.4.4.3 software [17] was used. For the comparison, a cut-off of more than 60% structural similarities was set to screen natural product library against synthetic drug library based on 2D-phormacophore core fragment analysis. The graphical representation of the comparison result is given in gure 3. Similarity score cut-off limit: Natural products exist in many of their stable analogues forms in nature. The analogous form of natural products even with their minor structural change are capable of exerting unique biological effects. Due to their complex structure at higher cut-off limits such variations will get excluded. Following table details the selection criteria for the inclusion of analogous structures of a parent molecule-Procyanidin. As the cut-off limits decreased, different structures were included which are the analogous forms of the native structure. Based on this a similarity cut-off of 60% was xed for the comparison.

Method validation:
State 1: In-silico pharmacokinetics, pharmacodynamic and toxicology pro ling: The molecular complexity is one of the major characteristics of natural products. The stereo-speci city, chirality, and cis/ trans-con guration contributing to the structural properties of the natural product are very speci c. The presence of fewer aromatic rings, larger macrocyclic aliphatic rings, lower nitrogen content, and increased oxygen content contribute majorly to this structural complexity [18]. The oral bioavailability of natural products can be predicted by considering such structural properties. In the pharmaceutical research, Rule of ve or Lipinski rule-of-ve (RO5) has become one of the widely used computational approaches for the estimation of solubility and permeability of new drug candidates [19]. Although, these parameters cannot be strictly applied to lter natural products, they can be used as a rule of thumb to describe the molecular properties necessary to lter candidate drug's pharmacokinetic (PK) and pharmacodynamics (PD) [20,21]. And hence, the natural product obtained from structural comparison can be considered as hits for PK/PD assessment in-silico.
Stage 2: High throughput virtual screening to predict Molecular interactions: The selected molecules were subjected for high throughput virtual screening (HTVS) using automated docking to identify their binding a nity with the selected targets [24,25]. The information on action mechanism of synthetic drugs was collected from Inxight: Drugs database of NIH-National Center for Advancing Translational Sciences (NCATS) server (https://drugs.ncats.io/). We used automated docking as a preliminary screening method to predict the probable interactions of selected natural products with their corresponding targets proteins. The targets were selected by considering the action mechanism of the most structurally similar synthetic drug (Table 2A and 3A). The Protein structure les were obtained from protein data bank and were edited by removing hetero atoms and added with Kollmann charges were added as a part of target preparation. For Empagli ozin and Luseogli ozin, and their structurally similar natural products SGLT2 inhibitor with PDB ID 2XQ2 was used for interaction studies. Similarly, for Prednisolone, Glucocorticoid receptor agonist with PDB ID 1M2Z was used. To study the binding effectiveness of natural products against COVID19 infection, Lopinavir and its related natural products were docked against antiretroviral protease inhibitor (I50V isolate) (PDB ID 3OXV), Ritonavir and its related natural products were docked against antiretroviral protease inhibitor (A02 isolate) (PDB ID 4NJV), Remdesivir and its related natural products were docked against antiretroviral protease inhibitor (PDB ID 7BV2) and Arbidol and its related natural products were docked against antiretroviral protease inhibitor (PDB ID 5T6S). For each target, residues forming the binding site were identi ed using PDBsum server. Grid box was set around the residues forming binding pocket and Broyden-Fletcher-Goldfarb-Shanno algorithm implemented in the AutoDockVina was employed to study appropriate binding modes of the ligand in different conformations [26]. For the ligand molecules, all the torsions were allowed to rotate during docking. The in-silico studies were performed on a local machine equipped with AMD Ryzen 5 six-core 3.4 GHz processor, 8GB graphics and 16 GB RAM with Microsoft Windows 10 and Ubuntu 16.04 LTS dual boot operating systems.
Stage 3: United-atom molecular dynamic simulation studies to predict the protein stability: United-atom molecular dynamics (MD) simulation studies are used to analyse the physical movement of atoms and molecules. To assess the structural stability up on ligand binding [27,28] MD simulations were run for a time scale of 20 ns. From each docked group, natural product bound targets with lowest binding energy was selected and subjected for MD simulations to study the small molecular interaction and protein stability. For simulation studies GROMOS96 54a7 [29] force eld implemented in GROMACS package (Abraham et al., 2015), version 2018 was employed. Brie y, the system was solvated using simple point charge (SPC) model [30] and neutralized using sodium and chloride ions. Energy minimization was done using steepest descent method followed with temperature coupling (300K) using V-rescale thermostat [31] and pressure coupling (10 5 Pa) using Parrinello-Rahman barostat [32]. LINCS algorithm [33] was employed to adjust bond length constraints and particle mesh Ewald method (PME) [34] was used to evaluate electrostatic interactions. Equilibration trajectories were prepared for a time scale of 1ns under periodic boundary conditions and nal MD trajectories were prepared for a time scale of 20ns at a time step of 2fs with trajectory coordinates regularly written at 10ps intervals. Thus produced MD trajectories were analysed using gmx energy, gmx rms, gmx rmsf, gmx gyrate, gm xhbond, gmx do_dssp, gmx covar, gmx anaeig and gmx sasa modules of GROMACS along with interaction energies in terms of electrostatic and van der Waals energy between the ligand and the macromolecule. In equation (II), E MM term consists of E bonded and E non−bonded energies. E non−bonded energy is the sum of van der Waals (Lennard-Jones potential function) and electrostatic (Coulomb potential function) energies. G solvation consists of sum of G polar and G non-polar terms, which represent the electrostatic and the non-electrostatic energies. G polar is calculated based on continuum implicit solvent model using the Poisson-Boltzmann equation [36]. The G non-polar term consists of sum of G cavity and G VdW . G cavity is the work done by the solute to create a cavity in the solvent and G VdW is the attractive van der Waals energy between solvent and solute. G non-polar accounts for the hydrophobic effect [37]. The entropic term (TΔS) denotes the product of the temperature and the entropic contribution.

Results
In-silico pharmacokinetics, pharmacodynamic and toxicology pro ling: The comparison within the T2D prescribed drugs library and COVID 19 drug library supported their similarities in structures and similar mechanisms of action (Table S1 & S2). The comparison of 166 selected natural products against 84 T2D prescribed drugs yielded 15 natural products with more than 60% structural similarity (Table 2C). The druglikeness estimated based on the molecular properties of the selected structures indicated that out of 15 molecules, 09 molecules with positive scores demonstrated druglikeness, while 14 molecules found to penetrate into human intestine, 09 molecules penetrating blood brain barrier and none of them being the Cytochromes P450 substrate indicating a high possibility of their bioavailability (Table 2A). The result of the in-silico PK/PD studies indicated that, out of 15 molecules, 02 were predicted to be mutagenic while none of the molecules was found to be tumorigenic, with any reproductive effects or irritants (Table 2B).
The method was extended with a larger dataset to validate the selection of natural products expected to be effective against COVID 19 infection. The complete natural product library consisting of 26,311 structures was screened against local Pubchem Covid19 library and clinical trials drug library. Among the total number of molecules screened, 17,798 natural product structures were found to have more than 60% structural similarities against Pubchem Covid19 library of which, 41 molecules were avans, 41 were avones, 272 were iso avonoids. The comparison against clinical trials drug library yielded 14,689 natural products which have more than 60% structure similarity consisting of 30 avans, 18 avones, 78 iso avonoids. The study was extended to compare the complete natural product library against most promising investigational drugs, viz. Remdesivir, Arbidol, Lopinavir and Ritonavir molecules which yielded 35 natural product structures with more than 60% structural similarity (Table 3C).
The druglikeness estimated based on the molecular properties of the selected structures indicated that out of 35 molecules, 23 molecules with positive scores indicated their potential drug like effects, while 33 molecules found to penetrate into human intestine, 17 molecules penetrating blood brain barrier and none of them being the Cytochromes P450 substrate, indicating a high possibility of their bioavailability (Table 3A). The result of the in-silico PK/PD studies indicated that, out of 35 molecules, 34 were predicted to be non-mutagenic and non-tumorigenic and non-irritant, with 10 molecules predicted to have reproductive effects (Table 3B).
High throughput virtual screening to predict Molecular interactions: The in-silico molecular interaction studies resulted in predicting the most effective natural product that could bind to appropriate target [24,25]. The binding was compared with the structurally similar synthetic drug counterpart whose target was used for the interaction studies. The study yielded natural products being effectively bound to their respective targets (Table 2C and 3C). The results were expressed in terms of docking energy (kcal/mol). The effectiveness of binding was validated by considering the protein stability using molecular dynamic simulation studies.
United-atom molecular dynamic simulation studies to predict the protein stability: In the present study MD simulations were performed to con rm the accuracy of binding resulted from docking studies. The result of MD simulation displayed the conformational changes acquired by target proteins of T2D and COVID 19 conditions after respective ligand binding ( g 4 & 5). The calculated MD parameters for unbound and ligand bound targets detailing RMSD, RMSF, Rg, SASA, Secondary structure, Coul-SR energy and LJ-SR energy of T2D and COVID 19 target proteins along with binding free energies and the contributing energy terms are tabulated in table 2D & 3D respectively. The MD trajectory analysis revealed that the natural products Naringin and Isoquercitrin displayed fewer RMS deviations and RMS uctuations compared to their structurally similar SGLT2 inhibitors; Empagli ozin and Luseogli ozin.
Similarly, Ursolic acid displayed fewer RMS deviation and RMS uctuation compared to glucocorticoid receptor inhibitor Prednisolone. The binding free energy calculated using g_mmpbsa module displayed better interactions between selected natural products with their respective target proteins when compared to their synthetic drug counterparts.
The average RMS deviations and RMS uctuations were calculated from the MD trajectories of natural product and synthetic drug bound HIV-1 protease (I50V isolate), In uenza virus heamagglutinin, SARS-CoV NSP 12 polymerase and HIV-1 protease (A02 isolate) and were compared with their respective unbound structures. Lesser RMS deviations were observed in HIV-1 protease (I50V isolate) after the binding of Hexahydropyrrolo Derivative, in SARS-CoV NSP 12 polymerase after the binding of HydroxyManzamin_A and HIV-1 protease (A02 isolate) after the binding of Bionectin_B compared to their respective chemical drug counterparts. RMS deviations was found to be lower in Arbidol bound In uenza virus heamagglutinin compared to natural product Phellibaurin_A bound structure. Lesser RMS uctuations was observed in the natural product bound structures of HIV-1 protease, In uenza virus heamagglutinin and HIV-1 protease compared to their respective chemical drug bound structures. The binding free energy calculations performed using g_mmpba module displayed better binding of natural products with their respective target proteins compared to respective chemical drug counterparts.

Discussion And Conclusion
The present research describes a novel method to screen natural products by primarily considering their molecular composition and 2-dimensional structural arrangements, followed by their effectiveness assessment against speci c disease condition. The method relies on the central foundation of medicinal chemistry that structurally similar molecules will have similar biological effect [1,8]. As an initial part of the study, the natural products were screened against synthetic and/or prescribed drugs for a speci c disease condition based on their molecular and 2-dimentional structural similarities which resulted in the selection of effective bioactive molecules. The identi ed natural products were checked for druglikeness, pharmacokinetics and pharmacodynamics properties by comparing against a large group of already reported datasets supporting the screening strategy, which formed the next part of the study. The biological interactions involved in the action mechanism of synthetic drugs was extended to their structurally similar natural products. The interaction studies forming the third part of the study described the binding effectiveness of the selected molecules against suitable drug targets. The effectiveness of binding was studied using MD simulation which formed the nal part of the methodology, nally reporting with the most effective drug-like natural product for disease conditions under study. Based on the molecular interactions, target's structural stability induced by natural products, in a way similar to that of their synthetic drug counterparts, it is convincing to state that the selected natural products can also exert similar effects if administered individually.
Several approaches have been proposed for searching and identi cation of chemical compounds in a dataset which includes conventional string sequence search, to using modern data structure module based search algorithms [3][4][5][6]. The similarity measurements generally consider either structural representations involving physiochemical properties, topological indices, molecular graphs, pharmacophore features, molecular shapes, molecular elds or quantitative measures involving Tanimoto coe cient, Dice index, cosine coe cient, Euclidean distance, Tversky index for similarity assessment between two structures [1]. The 3-dimentional con rmation based searching methods include pharmacophore modeling, shape similarity, molecular eld-based methods, 3D ngerprints to compare chemical compounds. The popular chemical databases like PubChem uses percent similarity measures employing Tanimoto equation [38] and a dictionary-based ngerprint, analogous to the MACCS (Molecular ACCess System) structure-based keys [39], while the tools like SwissTargetPredictior predicts the target for bioactive molecules based on a combination of 2D and 3D similarity measures. However, it fails to predict the biological effects if the prediction accuracy is signi cantly lower for molecules with unknown bioactivity [9]. The OSIRIS DataWarrior program, employed in the current study for the assessment of molecular similarity uses chemical similarity based methods which relies on substructure fragments to biological similarity considering 3D-geometry and binding [17]. SAHANA by utilizing these calculations uniquely screens natural products against speci c disease conditions by targeting a speci c drug target, even for molecules whose bioactivity is not known, and proceeds for bioactivity assessment based on molecular interactions and structural stability assessments using molecular docking and MD simulation studies.
Limitations and other considerations: The methodology described in SAHANA relies completely on computational calculations. However, the results have the statistical accuracy; the experimental validations are required to con rm the reliability of the prediction. In this avenue, the prediction made using SAHANA can act as a 'rule of thumb' in screening a large dataset of natural products against any speci c disease condition. The other limitation is SAHANA cannot be applied to conditions, which do not have any prescribed drugs with prede ned mechanism of action. As the prediction relies on the structural similarities between natural products and the available drugs, the presence of prescribed drug with well-de ned action mechanism is a prerequisite for SAHANA.

Declarations
Ethics approval and consent to participate: Not applicable.
Consent for publication: Not applicable.
Availability of data and materials: All the data used during the current study are available from the corresponding author on reasonable request.

Figure 1
Structural similarity between thymidine and zidovudine with variations highlighted inside the ellipse.  Output of structurally similar molecule comparison results from DataWarrior software.