Monte Carlo optimization based QSAR modeling, molecular docking studies, and ADMET predictions of compounds with antiMES activity

The paper deals with quantitative structure–activity relationship (QSAR) modeling-based Monte Carlo optimization. The molecular descriptors involve the local molecular graph invariants and the SMILES notation for the molecules whose antiMES activity is active against maximal electroshock seizure (MES). The developed QSAR model was validated with the use of various statistical parameters, such as the correlation coefficient, cross-validated correlation coefficient, standard error of estimation, mean absolute error, root-mean-square error Rm2, MAE-based metrics, the Fischer ratio, as well as the correlation ideality index. Along with the robustness of the developed QSAR model, the used statistical methods yielded an excellent predictability potential. The discovered molecular fragments utilized for the preparation of the computer-aided design of the new compounds were thought to have led to the increase and decrease of the examined activity. Molecular docking studies were referred to when making the final assessment of the designed inhibitors. This emphasized excellent correlation with QSAR modeling results. The computation of physicochemical descriptors was conducted in order to predict ADME parameters, pharmacokinetic properties, the drug-like nature and medicinal chemistry friendliness, with the aim of supporting drug discovery. Based on the results, all the designed molecules indicate the presence of high drug-likeness.


Introduction
Epilepsy is not considered to be a disease in strict terms, but a syndrome accompanied by considerable discharges of a lot of neurons, resulting in alterations to normal electrochemical balance of the brain [1][2][3][4].Seizures frequently occur as a result of epileptic episodes owing to the fact that a single neuron or neuron group in the brain develop irritability or hyper-excitability.There are various reasons why the mentioned seizures may occur, such as hypoxia, hypoglycemia, ischemia, or electrolyte abnormalities.In such instances, during the discharge of action potentials, nerve cells disregard proper attenuation and suppression, leading to irregular discharges.The mentioned action potential concerns the electro-physiologic voltage change, which appears in the axon of neurons because of the transient variation in the permeability of sodium and potassium of the axon.The location (or focus) of the aberrant discharges in the brain serves to determine their resulting effect's manifestation, which could prove to be a motor symptom (such as tonic-clonic contractions) or might lead to sensory manifestations (such as paresthesia and hallucinations).In the event that the various regions of the brain are reached by the mentioned foci, an uninhibited and chaotic electrical activity discharge would occur, leading to the manifestation of sensory and/ or motor activity in the patient, which isclinically designated as a seizure [4][5][6][7].There is a possibility to control the seizure by suppressing the action potential through the manipulation of potassium and sodium's ion permeability, leading to the axon becoming refractory to the action potential.Contrary to this, this could lead to the blockage of the synapse impulse transmission through the prevention of the neurotransmitter's binding to its receptor site as a result of blockage, or through the prevention of synthesis and/or its release [8,9].A lot of research has been aimed at developing therapies in order to curb and prevent the occurrence of the seizures, attempting to alleviate them with medicine, leading to the development of various antiepileptic drugs (AEDs) [10][11][12][13].Nonetheless, epileptic episodes are still experienced by about 25% of the patients, despite the optimal AED usage on the market, and it usually takes a couple of years for administering AED treatment.During this time, there may be occurrences of side-effects [14,15].This is why more effective and safer new antiepileptic drugs need to be developed.
There are two major strategies for designing and developing new AED.They are (a) search for new compounds to enable the modification of certain stages of the cellular mechanism which cause epilepsy (mechanism-based design) and (b) modifications of preexisting compound structure(structure-based design) [16].In the development and design of AEDs, the quota of in silico studies has contributed to the implementation of the two approaches [7,8].The mentioned studies are alternatives to compound synthesis and laboratory screening in the real world.These studies are dependent on the virtual world of data analyses, designs, and hypotheses stored in a computer.Owing to their application, computational models and screens are able to explore the initial concepts for the purpose of justifying the necessary commitment to actual costly bioassays and synthesis [17].During this phase, the exact classification of AED based on their action mechanisms is impossible because some of them do not act on specific binding sites and most interact with various receptors [18,19].In spite of this, the cellular mechanisms that might occur at the time of a drug's action in an epileptic patient, inter alia, involve the voltage-dependent blockade of Na + channels, cellular GABA uptake inhibition, γ-aminobutyric acid (GABA) synthesis modulation, or degradation, GABA (A) receptor modulation, adenosine metabolism modulation, as well as the modulation of a range of excitatory amino acid receptors [20][21][22].Furthermore, the drug molecules' anticonvulsant activity has been studied and evaluated in various experimental animal models.The most popular subcutaneous pentylenetetrazole (PTZ) test and maximal electroshock seizure (MES) test were also covered.The MES test included seizures which were electrically induced in animals, leading to the hind limb tonic extensor, which was followed by the administration of drug molecules in order to eliminate the impact of the mentioned.Therefore, all drug types that have been efficacious in suppressing MES have been reported to serve the purpose of inhibitors to the neuronal voltage-gated sodium channel (VGSC).What is more, the MES has been reported to represent an adequate grand mal seizure model.Compounds for seizure spread prevention have been determined as well [22].
The main aim of this paper is to reveal what the rationality is in when designing new 1H-pyrazole-5-carboxylic acid derivatives, which will demonstrate activity against MES-induced seizures, by relying on the utilization of molecular docking strategies and the quantitative structure activity relationship study (QSAR).QSAR is a computational approach which links the quantitative measure of a compound's chemical structure with its activities through the use of a numerous computer-based processes in order to reach a prediction concerning the model, equation or relationship that would make it easier to suggest the activity of known compounds with unknown activities [23,24].The development process of a lot of anticonvulsant molecules has been based on the mentioned approach [25][26][27][28][29]. Nevertheless, there are not many reports which cover the rational design of new 1H-pyrazole-5-carboxylic acid derivatives using the QSAR strategy.Conversely, the molecular docking technique is employed for the purpose of exploring the two interacting molecules' binding mode, depending on their topographic features or energy consideration, for the purpose of fitting the mentioned into the conformation, thus leading to favorable interactions.Docking often represents the basis for discovering candidates for new drugs or lead compounds for revealing the mechanisms and most significant elements in protein-ligand interaction and, as a consequence, rational drug design [30].For this reason, the mentioned strategies permit the rational design of new 1H-pyrazole-5-carboxylic acid derivatives, with quantitative estimates that concern their potencies determined from the data regarding the molecular descriptor's contributions to antiMES activity.The interaction between the designed molecules and the anticonvulsant molecular target is also elucidated by these strategies.

Molecule data set
A 62-molecule data set with an established antiMES effect was discovered in literature [31].Its anticonvulsant activity against MES-induced seizures is expressed in the form of ED50 (mg/kg) (the dosage determined to be effective in 50% of the tested animals).The reported antiMES activities of the selected compounds were recalculated to the molar unit with a view to making an easy comparison between the molecules.Afterwards, they were converted to the logarithmic unit (i.e., -log ED 50 designated pED 50 ) with the aim of achieving an increase in the linearity of the activity values cited in Table S1 (Supplementary Material), where the SMILES notation represents the corresponding molecular structures.Three random splits were utilized for the purpose of generating original data in the training set (47 compounds, 75%) and test set (15 compounds, 25%).The normality of the activity distribution was verified with the use of the published method for all the splits [32].

The Monte Carlo optimization method
The Monte Carlo optimization method was the chief algorithm possessing a hybrid molecular descriptor approach.It was used for the purpose of developing the conformationindependent QSAR model.The mentioned method includes the utilization of the molecular graph, as well as the SMILES notation-based descriptors.The local graph invariants were the molecular graph-based descriptors.They included basic theoretical graph concepts, such as walks and paths.The detailed mathematical definitions of these can be found in literature [33].The local graph invariants which proved to be the optimal topological descriptors employed in this study were Morgan extended connectivity indices of increasing orders (EC0); the number of carbon atom neighbors (number of carbon); and the number of non-carbon atom neighbors (number of non-carbon) with path numbers in the length of 2 and 3 (p2, p3), as well as valence shells with the value of 2 and 3 (s2, s3).In comparison with molecular-graphbased descriptors, the SMILES notation-based molecular descriptors exhibit a mechanistic interpretation which could prove to be in correlation with molecular fragments.Each numerical value of the SMILES notation descriptor for a molecule contributes to the molecule's correlation weight (DCW).It is mathematically defined as the sum of the correlation weights (CW) of all the defined SMILES descriptors, calculated using Eq. 1.
In Eq. 1, z, x, y, t, α, β, and γ are numbers 1 (yes) or 0 (no), and on the basis of their values, it can be established if the SMILES descriptor is used in the development of the model or not.Symbol S k represents one SMILES notation symbol of the SMILES atom (or two inseparable ones) and relates to local descriptors.The linear combinations of two and three SMILES atoms are represented by these descriptors.The atoms are here represented with symbols SS k and SSS k , respectively.The second SMILES notation optimal descriptor type is represented by the global descriptor and concerns the global features of the studied molecule.This study involved the use of these global SMILES notationbased descriptors: ATOMPAIR, BOND, HALO, NOSP, and HARD, all defined on the basis of the published methodology [34,35].The development of the QSAR model here described is based on the combination of the SMILES notation (local and global) and the local graph invariant descriptors.The approach required calculating the DCW for the molecules in accordance with Eq. 2.
In addition to the symbols S k , SS k , and SSS k written above, these symbols are also used in Eq. 2: paths in the length of 2 and 3-PT2 k and PT3 k , the Morgan connectivity index of the zero order (a hydrogen suppressed graph was used here)-EC0 k , valence shell 2 and 3-VS2 k and VS3 k , and Nearest Neighbors-NNC k [33].All the above molecular descriptors were calculated with the CORAL software (CORrelation and Logic) (http:// www.insil ico.eu/ coral).The correlation weight (CW), i.e., a numerical value, was assigned to each optimal descriptor with the Monte Carlo method.This process involved creating relevant random numbers with the use of the Monte Carlo method, which is followed by the manner in which these number fractions relate to a single property or certain properties.Descriptors are assigned the CW value randomly in accordance with the SMILES notation for a specific endpoint in every separate Monte Carlo run.The Monte Carlo method was optimized for the purpose of conducting numerical data calculations of the correlation weights that realize the maximum correlation coefficient value between the endpoint and the optimal descriptor.The following two parameters are to be taken into consideration when this method is used for the purpose of generating a QSAR model-the number of epochs (N epoch ) and threshold (T).The coefficient for classifying different molecular features is the threshold.These features involve the SMILES based molecular fragments and the SMILES based indices.The features were calculated using the SMILES notation, after which they were classified into two categories: (a) active features (in this case, the modeling process included the correlation weight); and (b) rare features (the modeling process excluded the correlation weight in this particular instance).The process was the following: in the event that one molecular feature (X), which is defined by the SMILES notation for the molecules belonging to the training set, appears less than T times, the model building is to exclude molecule descriptor X.In this way, the numerical (2) value representing this feature (the correlation weight of the X, CW(X)) is set to zero and thus defined as rare.All the other molecular features may be used for model building since they are active.N epoch provides the best statistical quality of the training set.It designates Monte Carlo optimization's epoch number.Once an unlimited epoch number is applied, the Monte Carlo optimization provides the training set's maximum correlation coefficient.Nevertheless, the epoch number for realizing the maximum correlation coefficient between the optimal descriptor and the endpoint for the external test set is definite.It is favored by calculations, since the model obtained exhibits excellent predictive potential, provided that the said value is attained by the epoch number.Conversely, the T increase is followed by the training set's correlation coefficient decrease.Still, a threshold exists for the maximum correlation coefficient of the test set.When viewed from the practical perspective, the threshold is beneficial.Determining the threshold's preferable values and the epoch number of the Monte Carlo optimization (T and N epoch ) is crucial for generating a proper QSAR model with the use of the optimal SMILES-based descriptor [34,35].The purpose of Monte Carlo method simulations conducted on the basis of iterative algorithms is revealing the unknown probabilistic entity's distribution.The Monte Carlo optimization process encompasses the epoch number for the training set and a specific target function.The first step is the termination of CW (SA) for each SA SMILES attribute, where the starting values of all the CWs are set to 1 ± 0.01 × Rnd (Rnd stands for random value generator, and it is in the range of 0 to 1).The attribute number's regular order is replaced by a random sequence.The following phase entails assessing the target function's starting value and the subsequent modification of the correlation weight.When the process reaches completion, the relevant steps of the Monte Carlo optimization are redone for non-rare attributes [34,35].The QSAR model is calculated by using the linear regression approach (with the training set) (Eq.3), at which time the model provides the numerical data concerning the correlation weights, with preferable statistics for the test set.What is more, the research involves the search for the best combination of T and N epoch within values 1-5 for T and 0-50 for N epoch .

Validation of the developed QSAR models
The verifications of the adequacy of the developed QSAR model according to Monte Carlo optimization and based on 2D were conducted using a range of validation metrics, namely: the squared correlation coefficient (r 2 ), the leaveone-out and leave-many-out cross-validation coefficients loo , q 2 ), the root-mean-squared-error (RMSE), y-scrambling, the F-value and the mean absolute error (MAE) [34][35][36][37][38][39].The developed QSAR models were further validated with the following statistical metrics: the correlation coefficient (CCC), the ideality of correlation index (IIC), and the R m 2 and MAE-based metrics [40].The applicability domain (AD) is one of the crucial features of any developed QSAR model, which needs to be defined before the QSAR model is employed [41,42].The AD method found in literature was used for defining the conformational-independent QSAR models [43].The applicability domain (AD) must be defined for the purpose of prediction before any QSAR model is used.Furthermore, it represents a crucial addition to a relevant, reliable, robust, and valid QSAR model.The AD of the developed QSAR model presented in this paper was established with the "statistical defects" of conformationindependent molecular descriptors-d(A), previously used in the development of the QSAR model [43].The computations were done with the CORAL software and Eq. 4.
In the equation, P(A train ) and P(A calib ) are the probabilities of a descriptor or conformation-independent attribute (A) in the training sets and test sets, respectively.Correspondingly, N(A train ) and N(A calib ) represent the frequency of a descriptor or conformation-independent attribute (A) appearing in the training sets and test sets, respectively.The statistical SMILES defect (D) defines the sum of defects, d(A), for all the attributes available in the molecules' SMILES notation, mathematically defined as in Eq. 5 In the cited AD, the molecule is unclassified and is a categorized outlier if its D > 2 × D av ; where D av stands for the average of the D, calculated for the relevant set (training set or test set), where the molecule is placed.

Molecular docking
Molegro Virtual Docker (MVD) software was used for docking studies involving geometrically optimized ligands employed for the development of the 3D QSAR model.The human voltage-gated sodium channel, brain isoform, was the target of the docking studies (Nav1.2) (PDB: 2KAV).MVD utilizes a rigid receptor structure and flexible structure for the ligands for docking studies.It provides both hydrophilic interactions and hydrophobic interactions (mostly regarding the Van der Waals and steric interactions).These include the identification of hydrogen bonds between the amino acids from the studied ligands and the active site."Scoring" (4) functions can be used to quantify such interactions.They are the calculated numerical values that are related to relevant binding energies [44].The rule of thumb applies in the case of most enzymes-the higher the interaction between the receptor and the ligand, the higher the inhibition.Thus, the numerical values obtained for "scoring" functions could be used to evaluate the prospective inhibition effect of the ligands that are studied 34 .In order to perform an inhibitory potential estimation, the following "scoring" functions were calculated and used: Hbond, VdW, Steric, Pose energy, NoHbond, MolDock, and Rerank score.The published methodology was used to validate a full molecular docking protocol [45,46].Discovery Studio Client v20.1.0.19 was utilized for displaying two-dimensional representations of the interactions between the studied molecules and the dopamine transporter active site of the amino acids.

Results and discussion
The development of the QSAR model's prediction capability was evaluated using these statistical parameters: r 2 , correlation coefficient; q 2 , cross-validated correlation coefficient; CCC, concordance correlation coefficient; IIC, index of ideality of correlation; s, standard error of estimation; MAE, mean absolute error; and F, Fischer ratio.The numerical values of these, presented in Table 1, were utilized for determining how good the developed conformation-independent QSAR models which were obtained through the Monte Carlo optimization method actually were.The Av abbreviation represents the average value for the statistical parameters secured from three independent Monte Carlo optimization runs.When it comes to the calculated numerical values, the Monte Carlo optimization method yielded QSAR models that displayed good reproducibility and a high predictability potential.Also, the used metrics indicate that the best QSAR model for antiMES activity was attained for the third split, where the T value was 4 and the N epoch was 11.According to the applied methodology for AD, all the molecules were within the calculated AD, and no outliers were detected.Figure 1 presents the graphical representation of the best developed QSAR model (the highest obtained r 2 value) regarding the best Monte Carlo optimization run for all three splits.Figure 1 also presents the difference between the experimental and the calculated pED 50 values regarding the best Monte Carlo optimization run, both for the molecules in the test set and for those in the training set.
All the realized QSAR models exhibited high reproducibility in accordance with the obtained concordance correlation coefficient (CCC).The application of the MAEbased metric provided further validation, indicating that all the QSAR models were GOOD.The sturdiness of the developed QSAR models was determined through Y-randomization (with the Y values scrambled in 1000 trials over ten separate runs), and the developed QSAR models do not have accidental correlations, as per the results presented in Table 2.The final estimation whether QSAR is adequate or not was performed with the index of ideality of correlation (IIC).The obtained numerical values show that all the developed QSAR models possess a high predictive potential.
Table 1 The statistical quality of the developed QSAR models for antiMES activity r 2 correlation coefficient, q 2 cross-validated correlation coefficient, CCC concordance correlation coefficient, IIC index of ideality of correlation, s standard error of estimation, MAE mean absolute error, F Fischer ratio, Av average value for statistical parameters obtained from three independent Monte Carlo optimization runs

Training set
Test set A mathematical representation of the best QSAR models based on the obtained test set r 2 for all the splits and for the antiMES activity is cited in Eqs.6-8.One of the chief goals of this research was to determine molecular fragments, defined as the SMILES notation optimal descriptors with a positive and negative influence on the examined activity [34,35,[47][48][49][50].The full list of molecular descriptors can be found in Table S2 (Supplementary Material).These are based on the molecular graph and the SMILES notation.Table 3 displays an example of the calculation of a molecule's summarized correlation weight (DCW) and the studied activity (pED 50 ).Here, the molecular graphbased descriptors were excluded for the purpose of getting an easier interpretation.The summarized process in Fig. 2 represents the computer-aided design (CAD) of higher/lower activity compounds, which represents one of the key goals of this research.The conformational-independent results in the CAD process generated the design of nine novel potential inhibitors (structures presented in Fig. 2).
Molecule A represented the template molecule, because it is one of the least chemically exploited molecules.Figure 2 shows the highlighted part of molecule A, which is favorable for chemical modification with the use of the SMILES notation fragments that have a positive effect on the studied activity and those that were obtained from the conformational-independent QSAR studies.Table 4 cites the list of all the designed molecules, in addition to their calculated pED 50 values.
On the basis of the results that the QSAR modeling yielded, the SMILES notation descriptors associated with molecular fragments that have a positive impact on the pED 50 for both the studied activities and the yield increase in such activities are the following: "C…………"-carbon atom or a methyl group, fragment that has a positive effect, whose addition led to the increase in the calculated pED 50 values for molecule A1, when compared to the calculated pEC 50 values for template molecule A; "C…C……."-twoconnected carbon atoms or ethyl group, fragment possessing a positive effect, whose addition led to the increase in the calculated pIC 50 values for molecule A2, when compared to the calculated pED 50 values for the template molecule; "C…(…c…", "C…(…", "c…1…(…", " c…C… …."-associated with the addition of at least one methyl group to benzene, leading to the molecule's branching;   "N………..," nitrogen atom; "N…C…….,"amino group; "N…c…," nitrogen atom bonded to benzene; "C…N… (…," branching on the amino group, are fragments found in molecules A3, A5, and A6, molecules that have a primary, secondary, and tertiary amino group, respectively, all with higher calculated values for pEC 50 when compared to template molecule A; "O………..," oxygen atom; "O…c…1…", "c…O…….,"and "c…O…C…," fragments indicating that the hydroxyl/methoxy group is bounded to benzene; molecules A7 and A8 have an added hydroxyl/ methoxy group in the para position, and the calculated values for pED 50 for the studied activity are higher when compared to molecule A; "F……….,""HALO10000000," " + + + + F-N = = = ," SMILES descriptors indicating that the molecule has a fluorine atom, as well as a combination of a fluorine and nitrogen atom, "c…F……," SMILES descriptors indicating that the fluorine atom is bound to the benzene ring; molecule A9 has an added fluorine atom in the para position and the calculated values for pED 50 are higher when compared to molecule A. Molecular docking studies were conducted on all the designed molecules and template molecule A, with both studied enzymes in order to evaluate the predictability of the developed QSAR models and further validate the said models.Table 5 cites all the numerical values for the calculated "scoring" functions.Various scoring functions could be associated with various ligand-amino acids interactions.For this reason, when the inhibitory potency assessment is made, all needs to be taken into consideration.The MolDock and ReRank "score" function results showed that the molecules that potentially possess the highest inhibitory activity are molecules A6 and A7, in the case of the studied enzyme, and the QSAR modeling results support this.It has been proven that template molecule A is the molecule with the lowest MolDock and ReRank "score" function values, which has a good correlation with the QSAR modeling results.Literature [34,51] outlines detailed definitions of other "scoring" functions, in addition to the type of the potential impact these may have on inhibitory activity.Moreover, the aforementioned "scoring" function values could be used to perform similar analyses.The Supplementary Information section Figures contain all the interactions between the amino acids originating from the dopamine transporter active site and the selected molecules.They also exhibit a 2D representation of hydrogen bonds and the hydrophilic and hydrophobic interactions within the binding pocket.Figure 3 contains the best calculated poses of all the designed molecules within the human voltage-gated sodium channel, the brain isoform active site. Ensuring that new compounds possess physical and chemical features to classify them as potential therapeutics represents one of the first steps in the early development stages of drugs.A molecule needs to exhibit the following key features in order to be classified as drug-like: good absorption/permeation, gastrointestinal absorption and brain penetration, oral bioavailability, optimal bioavailability and the efficacy of binding to receptors/channels.These are the features which the molecule structure can predict.The Swis-sADME web service was used during the research within the aim of calculating the physicochemical descriptors and predicting pharmacokinetic properties, medicinal chemistry friendliness, ADME parameters, as well as the drug-like nature of the designed molecules in order to support drug discovery [51].Table S3 (Supplementary Material) presents the results obtained, and these results indicate that all the designed molecules have high drug-likeness.

Conclusion
The chief aim of this research is developing robust QSAR models that exhibit good predictability, determined by employing various statistical parameters, for the purpose of dopamine transport inhibition.The Monte Carlo optimization method was used to calculate the conformation-independent models, developed according to the optimal descriptors and derived from the SMILES notation invariants and a local graph.A QSAR model was generated from the pool of vast 0D, 1D, and 2D molecule descriptors with multiple linear regressions and a genetic algorithm.The assessment of the developed QSAR models' predictive potential and robustness was realized through the application of a range of statistical techniques.The high applicability of the developed QSAR models can be seen in the realized numerical values utilized to validate them.The Monte Carlo optimization method was successful in determining the molecular fragments used in QSAR modeling as the SMILES notation fragments possessing a positive and negative effect on dopamine transport inhibition.Afterward, these were used to perform the computer-aided design of new compounds with higher pIC 50 values.The final validators of the developed QSAR model and the potential inhibitory effect of the designed molecules were the molecular docking studies.The obtained results indicate good inter-correlation.The conclusion was made based on the comparison between the calculated pIC 50 from the best developed QSAR model and the one from the calculated energies originating from the interactions between the selected molecules and the amino acids in the dopamine transporter's active site.The purpose of calculating the designed molecules' physicochemical descriptors was to predict ADME parameters, pharmacokinetic properties, medicinal chemistry friendliness, and drug-like nature with a view to supporting drug discovery.The results have shown that all the designed molecules exhibit a high drug-likeness and a high bioavailability.In brief, the methodology presented in this research can be utilized for the discovery of new therapeutics to treat epilepsy.

Fig. 2
Fig. 2 Chemical structures of designed molecules

Fig. 3 A
Fig. 3 A The best calculated poses for all the designed molecules within the active site of the human voltage-gated sodium channel, the brain isoform (Nav1.2).B Hydrophobic interactions inside the active site.C Ionizability interactions inside the active site.D Aromatic

Table 3
Example of DCW calculation

Table 4
The list of all the designed molecules with their SMILES notation and calculated activities