Monte Carlo optimization based QSAR modeling, molecular docking studies and ADMET predictions of compounds with antiMES activity

doi:10.21203/rs.3.rs-3132730/v1

The paper deals with QSAR modeling-based Monte Carlo optimization. The molecular descriptors involve the local molecular graph invariants and the SMILES notation for the molecules whose anti-MES activity is active against maximal electroshock seizure (MES). The developed QSAR model was validated with the use of various statistical parameters, such as the correlation coefficient, cross-validated correlation coefficient, standard error of estimation, mean absolute error, root-mean-square error R_m², MAE-based metrics, the Fischer ratio as well as the correlation ideality index. Along with the robustness of the developed QSAR model, the used statistical methods yielded an excellent predictability potential. The discovered molecular fragments utilized for the preparation of the computer-aided design of the new compounds were thought to have led to the increase and decrease of the examined activity. Molecular docking studies were referred to when making the final assessment of the designed inhibitors. This emphasized excellent correlation with QSAR modeling results. The computation of physicochemical descriptors was conducted in order to predict ADME parameters, pharmacokinetic properties, the drug-like nature and medicinal chemistry friendliness, with the aim of supporting drug discovery. Based on the results, all the designed molecules indicate the presence of high drug-likeness.

QSAR

anti-MES activity

Drug design

Molecular modeling

Epilepsy Therapeutics

Epilepsy is not considered to be a disease in strict terms, but a syndrome accompanied by considerable discharges of a lot of neurons, resulting in alterations to normal electrochemical balance of the brain [1–4]. Seizures frequently occur as a result of epileptic episodes owing to the fact that a single neuron or neuron group in the brain develop irritability or hyper-excitability. There are various reasons why the mentioned seizures may occur, such as hypoxia, hypoglycemia, ischemia, or electrolyte abnormalities. In such instances, during the discharge of action potentials, nerve cells disregard proper attenuation and suppression, leading to irregular discharges. The mentioned action potential concerns the electro-physiologic voltage change, which appears in the axon of neurons because of the transient variation in the permeability of sodium and potassium of the axon. The location (or focus) of the aberrant discharges in the brain serves to determine their resulting effect’s manifestation, which could prove to be a motor symptom (such as tonic-clonic contractions) or might lead to sensory manifestations (such as paresthesia and hallucinations). In the event that the various regions of the brain are reached by the mentioned foci, an uninhibited and chaotic electrical activity discharge would occur, leading to the manifestation of sensory and/or motor activity in the patient, which isclinically designated as a seizure [4–7]. There is a possibility to control the seizure by suppressing the action potential through the manipulation of potassium and sodium’s ion permeability, leading to the axon becoming refractory to the action potential. Contrary to this, this could lead to the blockage of the synapse impulse transmission through the prevention of the neurotransmitter’s binding to its receptor site as a result of blockage, or through the prevention of synthesis and/or its release [8, 9]. A lot of research has been aimed at developing therapies in order to curb and prevent the occurrence of the seizures, attempting to alleviate them with medicine, leading to the development of various antiepileptic drugs (AEDs) [10–13]. Nonetheless, epileptic episodes are still experienced by about 25% of the patients, despite the optimal AED usage on the market, and it usually takes a couple of years for administering AED treatment. During this time, there may be occurrences of side-effects [14–15]. This is why more effective and safer new antiepileptic drugs need to be developed.

There are two major strategies for designing and developing new AED. They are: (a) search for new compounds to enable the modification of certain stages of the cellular mechanism which cause epilepsy (mechanism-based design) and (b) modifications of preexisting compound structure(structure-based design) [16]. In the development and design of AEDs, the quota of In silico studies has contributed to the implementation of the two approaches [7, 8]. The mentioned studies are alternatives to compounds synthesis and laboratory screening in the real world. These studies are dependent on the virtual world of data analyses, designs and hypotheses stored in a computer. Owing to their application, computational models and screens are able to explore the initial concepts for the purpose of justifying the necessary commitment to actual costly bioassays and synthesis [17]. During this phase, the exact classification of AED based on their action mechanisms is impossible because some of them do not act on specific binding sites, and most interact with various receptors [18, 19]. In spite of this, the cellular mechanisms that might occur at the time of a drug’s action in an epileptic patient, inter alia, involve the voltage-dependent blockade of Na + channels, cellular GABA uptake inhibition, γ-aminobutyric acid (GABA) synthesis modulation, or degradation, GABA (A) receptor modulation, adenosine metabolism modulation, as well as the modulation of a range of excitatory amino acid receptors [20–22]. Furthermore, the drug molecules’ anticonvulsant activity has been studies and evaluated in various experimental animal models. The most popular subcutaneous pentylenetetrazole (PTZ) test and maximal electroshock seizure (MES) test were also covered. The MES test included seizures which were electrically induced in animals, leading to the hind limb tonic extensor, which was followed by the administration of drug molecules in order to eliminate the impact of the mentioned. Therefore, all drug types that have been efficacious in suppressing MES have been reported to serve the purpose of inhibitors to the neuronal voltage-gated sodium channel (VGSC). What is more, the MES has been reported to represent an adequate grand mal seizure model. Compounds for seizure spread prevention have been determined as well [22].

The main aim of this paper is to reveal what the rationality is in when designing new 1H-pyrazole-5- carboxylic acid derivatives, which will demonstrate activity against MES-induced seizures, by relying on the utilization of molecular docking strategies and the quantitative structure activity relationship study (QSAR). QSAR is a computational approach which links the quantitative measure of a compound’s chemical structure with its activities through the use of a numerous computer-based processes in order to reach a prediction concerning the model, equation or relationship that would make it easier to suggest the activity of known compounds with unknown activities [23, 24]. The development process of a lot of anticonvulsant molecules has been based on the mentioned approach [25–29]. Nevertheless, there are not many reports which cover the rational design of new 1H-pyrazole-5-carboxylic acid derivatives using the QSAR strategy. Conversely, the molecular docking technique is employed for the purpose of exploring the two interacting molecules’ binding mode, depending on their topographic features or energy consideration, for the purpose of fitting the mentioned into the conformation, thus leading to favorable interactions. Docking often represents the basis for discovering candidates for new drugs or lead compounds for revealing the mechanisms and most significant elements in protein-ligand interaction, and as a consequence, rational drug design [30]. For this reason, the mentioned strategies permit the rational design of new 1H-pyrazole-5-carboxylic acid derivatives, with quantitative estimates that concern their potencies determined from the data regarding the molecular descriptor’s contributions to anti-MES activity. The interaction between the designed molecules and the anticonvulsant molecular target is also elucidated by these strategies.

2.1. Molecule Data Set

A 62-molecule data set with an established anti-MES effect was discovered in literature [31]. Its anticonvulsant activity against MES-induced seizures is expressed in the form of ED50 (mg/kg) (the dosage determined to be effective in 50% of the tested animals). The reported anti-MES activities of the selected compounds were recalculated to the molar unit with a view to making an easy comparison between the molecules. Afterwards, they were converted to the logarithmic unit (i.e. -log ED₅₀ designated pED₅₀) with the aim of achieving an increase in the linearity of the activity values cited in Table S1 (Supplementary Material), where the SMILES notation represents the corresponding molecular structures. Three random splits were utilize for the purpose of generating original data in the training set (47 compounds, 75%) and test set (15 compounds, 25%). The normality of the activity distribution was verified with the use of the published method for all the splits [32].

2.2. The Monte Carlo Optimization Method

The Monte Carlo optimization method was the chief algorithm possessing a hybrid molecular descriptor approach. It was used for the purpose of developing the conformation-independent QSAR model. The mentioned method includes the utilization of the molecular graph, as well as the SMILES notation-based descriptors. The local graph invariants were the molecular graph-based descriptors. They included basic theoretical graph concepts, such as walks and paths. The detailed mathematical definitions of these can be found in literature [33]. The local graph invariants which proved to be the optimal topological descriptors employed in this study were: Morgan extended connectivity indices of increasing orders (EC0), the number of carbon atom neighbors (Number Of Carbon) and the number of non-carbon atom neighbors (Number of Non Carbon) with path numbers in the length of 2 and 3 (p2, p3), as well as valence shells with the value of 2 and 3 (s2, s3). In comparison with molecular-graph-based descriptors, the SMILES notation-based molecular descriptors exhibit a mechanistic interpretation which could prove to be in correlation with molecular fragments. Each numerical value of the SMILES notation descriptor for a molecule contributes to the molecule’s correlation weight (DCW). It is mathematically defined as the sum of the correlation weights (CW) of all the defined SMILES descriptors, calculated using Eq. 1.

DCW(T,Nepoch) = zCW(ATOMPAIR) + xCW(NOSP) + yCW(BOND) + tCW(HALO) + rCW(HARD) + αΣCW(S_k) + βΣCW(SS_k) + γΣCW(SSS_k) (1)

In Eq. 1, z, x, y, t, α, β and γ are numbers 1 (yes) or 0 (no), and on the basis of their values, it can be established if the SMILES descriptor is used in the development of the model or not. Symbol S_k represents one SMILES notation symbol of the SMILES atom (or two inseparable ones) and relates to local descriptors. The linear combinations of two and three SMILES atoms are represented by these descriptors. The atoms are here represented with symbols SS_k and SSS_k, respectively. The second SMILES notation optimal descriptor type is represented by the global descriptor, and concerns the global features of the studied molecule. This study involved the use of these global SMILES notation-based descriptors: ATOMPAIR, BOND, HALO, NOSP and HARD, all defined on the basis of the published methodology [34, 35]. The development of the QSAR model here described is based on the combination of the SMILES notation (local and global) and the local graph invariant descriptors. The approach required calculating the DCW for the molecules in accordance with Eq. 2.

DCW(T,N_epoch) = ΣCW(S_k) + ΣCW(SS_k) + ΣCW(SSS_k) + ΣCW(EC0_k) + ΣCW(PT2_k) + ΣCW(PT3_k) + ΣCW(VS2_k) + ΣCW(VS3_k) + ΣCW(NNC_k) (2)

In addition to the symbols S_k, SS_k and SSS_k written above, these symbols are also used in Eq. 2: paths in the length of 2 and 3 – PT2_k and PT3_k, the Morgan connectivity index of the zero order (a hydrogen suppressed graph was used here) – EC0_k, valence shell 2 and 3 – VS2_k and VS3_k, and Nearest Neighbors – NNC_k [33]. All the above molecular descriptors were calculated with the CORAL software (CORrelation and Logic) (http://www.insilico.eu/coral). The correlation weight (CW), i.e. a numerical value, was assigned to each optimal descriptor with the Monte Carlo method. This process involved creating relevant random numbers with the use of the Monte Carlo method, which is followed by the manner in which these number fractions relate to a single property or certain properties. Descriptors are assigned the CW value randomly in accordance with the SMILES notation for a specific endpoint in every separate Monte Carlo run. The Monte Carlo method was optimized for the purpose of conducting numerical data calculations of the correlation weights that realize the maximum correlation coefficient value between the endpoint and the optimal descriptor. The following two parameters are to be taken into consideration when this method is used for the purpose of generating a QSAR model – the Number of epochs (N_epoch) and Threshold (T). The coefficient for classifying different molecular features is the threshold. These features involve the SMILES based molecular fragments and the SMILES based indices. The features were calculated using the SMILES notation, after which they were classified into two categories: a) active features (in this case, the modeling process included the correlation weight); and b) rare features (the modeling process excluded the correlation weight in this particular instance). The process was the following: in the event that one molecular feature (X), which is defined by the SMILES notation for the molecules belonging to the training set, appears less than T times, the model building is to exclude molecule descriptor X. In this way, the numerical value representing this feature (the correlation weight of the X, CW(X)) is set to zero and thus defined as rare. All the other molecular features may be used for model building since they are active. N_epoch provides the best statistical quality of the training set. It designates Monte Carlo optimization’s epoch number. Once an unlimited epoch number is applied, the Monte Carlo optimization provides the training set’s maximum correlation coefficient. Nevertheless, the epoch number for realizing the maximum correlation coefficient between the optimal descriptor and the endpoint for the external test set is definite. It is favored by calculations, since the model obtained exhibits excellent predictive potential, provided that the said value is attained by the epoch number. Conversely, the T increase is followed by the training set’s correlation coefficient decrease. Still, a threshold exists for the maximum correlation coefficient of the test set. When viewed from the practical perspective, the threshold is beneficial. Determining the threshold’s preferable values and the epoch number of the Monte Carlo optimization (T and N_epoch) is crucial for generating a proper QSAR model with the use of the optimal SMILES-based descriptor [34, 35]. The purpose of Monte Carlo method simulations conducted on the basis of iterative algorithms is revealing the unknown probabilistic entity’s distribution. The Monte Carlo optimization process encompasses the epoch number for the training set and a specific target function. The first step is the termination of CW (SA) for each SA SMILES attribute, where the starting values of all the CWs are set to 1 ± 0.01×Rnd (Rnd stands for random value generator, and it is in the range of 0 to 1). The attribute number’s regular order is replaced by a random sequence. The following phase entails assessing the target function’s starting value, and the subsequent modification of the correlation weight. When the process reaches completion, the relevant steps of the Monte Carlo optimization are redone for non-rare attributes [34, 35]. The QSAR model is calculated by using the linear regression approach (with the training set) (Eq. 3), at which time the model provides the numerical data concerning the correlation weights, with preferable statistics for the test set. What is more, the research involves the search for the best combination of T and N_epoch within values 1–5 for T and 0–50 for N_epoch.

Ac = C₀ + C₁×DCW(T,N_epoch) (3)

2.3. Validation of the Developed QSAR Models

The verifications of the adequacy of the developed QSAR model according to Monte Carlo optimization and based on 2D was conducted using a range of validation metrics, namely: the squared correlation coefficient (r²), the leave-one-out and leave-many-out cross-validation coefficients (q²_loo, q²), the root-mean-squared-error (RMSE), y-scrambling, the F-value and the mean absolute error (MAE) [34–39]. The developed QSAR models were further validated with the following statistical metrics: the correlation coefficient (CCC), the ideality of correlation index (IIC) and the R_m² and MAE-based metrics [40]. The Applicability domain (AD) is one of the crucial features of any developed QSAR model, which needs to be defined before the QSAR model is employed [41, 42]. The AD method found in literature was ued for defining the conformational-independent QSAR models [43]. The applicability domain (AD) must be defined for the purpose of prediction before any QSAR model is used. Furthermore, it represents a crucial addition to a relevant, reliable, robust and valid QSAR model. The AD of the developed QSAR model presented in this paper were established with the “statistical defects” of conformation-independent molecular descriptors – d(A), previously used in the development of the QSAR model [43]. The computations were done with the CORAL software and Eq. 4.

$$d\left(A\right)=\frac{\left|P\left({A}_{train}\right)-P\left({A}_{test}\right)\right|}{N\left({A}_{train}\right)-N\left({A}_{trest}\right)}$$

4

In the Eq, P(A_train) and P(A_calib) are the probabilities of a descriptor or conformation-independent attribute (A) in the training sets and test sets, respectively. Correspondingly, N(A_train) and N(A_calib) represent the frequency of a descriptor or conformation-independent attribute (A) appearing in the training sets and test sets, respectively. The statistical SMILES defect (D) defines the sum of defects, d(A), for all the attributes available in the molecules’ SMILES notation, mathematically defined as in Eq. 5

$$D=defect\left(SMILES\right)={\sum }_{k=1}^{NA}d\left(A\right)$$

5

In the cited AD, the molecule is unclassified and is a categorized outlier if its D > 2 × D_av; where D_av stands for the average of the D, calculated for the relevant set (training set or test set), where the molecule is placed.

2.4. Molecular Docking

Molegro Virtual Docker (MVD) software was used for docking studies involving geometrically optimized ligands employed for the development of the 3D QSAR model. The human voltage-gated sodium channel, brain isoform, was the target of the docking studies (Nav1.2) (PDB: 2KAV). MVD utilizes a rigid receptor structure and flexible structure for the ligands for docking studies. It provides both hydrophilic interactions and hydrophobic interactions (mostly regarding the Van der Waals and steric interactions). These include the identification of hydrogen bonds between the amino acids from the studied ligands and the active site. “Scoring” functions can be used to quantify such interactions. They are the calculated numerical values that are related to relevant binding energies [44]. The rule of thumb applies in the case of most enzymes – the higher the interaction between the receptor and the ligand, the higher the inhibition. Thus, the numerical values obtained for “scoring” functions could be used to evaluate the prospective inhibition effect of the ligands that are studied ³⁴. In order to perform an inhibitory potential estimation, the following “scoring” functions were calculated and used: Hbond, VdW,, Steric, Pose energy, NoHbond, MolDock and Rerank Score. The published methodology was used to validate a full molecular docking protocol [45, 46]. Discovery Studio Client v20.1.0.19 was utilized for displaying two-dimensional representations of the interactions between the studied molecules and the dopamine transporter active site of the amino acids.

The development of the QSAR model’s prediction capability was evaluated using these statistical parameters: r² – correlation coefficient, q² – Cross-validated correlation coefficient, CCC – concordance correlation coefficient, IIC – index of ideality of correlation, s – standard error of estimation, MAE – Mean absolute error and F – Fischer ratio. The numerical values of these, presented in Table 1, were utilized for determining how good the developed conformation-independent QSAR models which were obtained through the Monte Carlo optimization method actually were. The Av abbreviation represents the average value for the statistical parameters secured from three independent Monte Carlo optimization runs. When it comes to the calculated numerical values, the Monte Carlo optimization method yielded QSAR models that displayed good reproducibility and a high predictability potential. Also, the used metrics indicate that the best QSAR model for antiMES activity was attained for the third split, where the T value was 4 and the N_epoch was 11. According to the applied methodology for AD, all the molecules were within the calculated AD, and no outliers were detected. Figure 1 presents the graphical representation of the best developed QSAR model (the highest obtained r² value) regarding the best Monte Carlo optimization run for all three splits. Figure 1 also presents the difference between the experimental and the calculated pED₅₀ values regarding the best Monte Carlo optimization run, both for the molecules in the test set and for those in the training set.

Table 1

The statistical quality of the developed QSAR models for antiMES activity
		Training set							Test set
		r²	q²	CCC	IIC	s	MAE	F	r²	q²	CCC	IIC	s	MAE	F
Split 1	1 run	0.9439	0.9712	0.9310	0.9377	0.156	0.125	758	0.9196	0.9563	0.9589	0.8945	0.198	0.143	137
	2 run	0.9341	0.9659	0.9262	0.9271	0.169	0.137	638	0.9199	0.9535	0.9591	0.8916	0.200	0.145	138
	3 run	0.9378	0.9679	0.9280	0.9308	0.164	0.129	679	0.9190	0.9411	0.9586	0.8984	0.215	0.163	136
	Av	0.9386	0.9683	0.9284	0.9319	0.163	0.130	692	0.9195	0.9503	0.9589	0.8948	0.204	0.150	137
Split 2	1 run	0.9183	0.9574	0.9583	0.9082	0.189	0.157	494	0.9337	0.9656	0.9654	0.9148	0.185	0.123	183
	2 run	0.9357	0.9668	0.8867	0.9282	0.168	0.141	640	0.9101	0.9537	0.9534	0.8744	0.214	0.150	132
	3 run	0.9303	0.9639	0.9102	0.9227	0.175	0.145	587	0.9114	0.9544	0.9545	0.8724	0.212	0.144	134
	Av	0.9281	0.9627	0.9184	0.9197	0.177	0.148	574	0.9184	0.9579	0.9578	0.8872	0.204	0.139	150
Split 3	1 run	0.9663	0.9829	0.9420	0.9636	0.125	0.102	1292	0.9345	0.9304	0.9666	0.9101	0.276	0.214	171
	2 run	0.9767	0.9882	0.9471	0.9750	0.104	0.083	1885	0.9280	0.9254	0.9633	0.9034	0.283	0.249	155
	3 run	0.9747	0.9872	0.7974	0.9726	0.109	0.085	1736	0.9299	0.9309	0.9643	0.9068	0.270	0.234	159
	Av	0.9726	0.9861	0.8955	0.9704	0.113	0.090	1638	0.9308	0.9289	0.9647	0.9068	0.276	0.232	162
r² – Correlation coefficient q² – Cross-validated correlation coefficient CCC – Concordance correlation coefficient IIC – Index of ideality of correlation s – Standard error of estimation MAE – Mean absolute error F – Fischer ratio Av – Average value for statistical parameters obtained from three independent Monte Carlo optimization runs

[Figure 1]

All the realized QSAR models exhibited high reproducibility in accordance with the obtained concordance correlation coefficient (CCC). The application of the MAE-based metric provided further validation, indicating that all the QSAR models were GOOD. The sturdiness of the developed QSAR models was determined through Y-randomization (with the Y values scrambled in 1000 trials over ten separate runs), and the developed QSAR models do not have accidental correlations, as per the results presented in Table 2. The final estimation whether QSAR is adequate or not was performed with the index of ideality of correlation (IIC). The obtained numerical values show that all the developed QSAR models possess a high predictive potential.

Table 2

Y-randomization of the best QSAR model (best optimization run) for three independent splits
	Split 1		Split 2		Split 3
	Training	Test	Training	Test	Training	Test
0	0.9341	0.9199	0.9183	0.9337	0.9663	0.9345
1	0.0002	0.2483	0.0545	0.0644	0.0325	0.2481
2	0.0935	0	0.0017	0.1105	0.0214	0.0009
3	0.0002	0.058	0.0031	0.0479	0.0128	0.2371
4	0.0140	0.0447	0.0547	0.0737	0.0995	0.0043
5	0.0054	0.0863	0.0047	0.0203	0.0116	0.0108
6	0.0102	0.051	0.0037	0.0181	0.0028	0.0415
7	0.1258	0	0.0935	0.1775	0.0068	0.0066
8	0.0130	0.0039	0	0.2512	0.0167	0.0842
9	0.0115	0.0148	0.0136	0.2168	0.0022	0.0057
10	0.0023	0.2767	0.0074	0.1475	0.0183	0.0286
R_r²	0.0276	0.0784	0.0237	0.1128	0.0225	0.0668
^CR_p²	0.9202	0.8799	0.9064	0.8755	0.9550	0.9005
^CR_p² = R × (R²-R_r²)^1/2 should be > 0.5 [32]

A mathematical representation of the best QSAR models based on the obtained test set r² for all the splits and for the antiMES activity is cited in Eq. 6–8.

Split 1: pED₅₀ = 1.2600(± 0.0231) + 0.0360(± 0.0003)×DCW(2,14) (6)

Split 2: pED₅₀ = 1.6112(± 0.0246) + 0.0239(± 0.0002)×DCW(4,6) (7)

Split 3: pED₅₀ = 1.0330(± 0.0102) + 0.0342(± 0.0001)×DCW(4,11) (8)

The presented Eq. 6–8 show that the preferable T and N_epoch values for split 1 are 2 and 14, that the preferable T and N_epoch values for split 2 are 4 and 6, and that the preferable T and N_epoch values for split 3 are 4 and 11 for the antiMES activity.

One of the chief goals of this research was to determine molecular fragments, defined as the SMILES notation optimal descriptors with a positive and negative influence on the examined activity [34, 35, 47–50]. The full list of molecular descriptors can be found in Table S2 (Supplementary Material). These are based on the molecular graph and the SMILES notation. Table 3 displays an example of the calculation of a molecule’s summarized correlation weight (DCW) and the studied activity (pED₅₀). Here, the molecular graph-based descriptors were excluded for the purpose of getting an easier interpretation. The summarized process in Fig. 2 represents the Computer-Aided Design (CAD) of higher/lower activity compounds, which represents one of the key goals of this research. The conformational-independent results in the CAD process generated the design of nine novel potential inhibitors (structures presented in Fig. 2).

[Figure 2]

Molecule A represented the template molecule, because it is one of the least chemically exploited molecules. Figure 2 shows the highlighted part of molecule A, which is favourable for chemical modification with the use of the SMILES notation fragments that have a positive effect on the studied activity, and those that were obtained from the conformational-independent QSAR studies. Table 4 cites the list of all the designed molecules, in addition to their calculated pED₅₀values.

Table 4

The list of all the designed molecules with their SMILES notation and calculated activities
Molecule	SMILES notation	pED₅₀
A	Nc1ncnc2c1ncn2Cc1ccccc1	4.1415
A1	Cc1ccc(cc1)Cn1cnc2c1ncnc2N	4.1856
A2	CCc1ccc(cc1)Cn1cnc2c1ncnc2N	4.2763
A3	CC(c1ccc(cc1)Cn1cnc2c1ncnc2N)C	4.5860
A4	Nc1ccc(cc1)Cn1cnc2c1ncnc2N	4.4017
A5	CNc1ccc(cc1)Cn1cnc2c1ncnc2N	4.4697
A6	CN(c1ccc(cc1)Cn1cnc2c1ncnc2N)C	4.6939
A7	Oc1ccc(cc1)Cn1cnc2c1ncnc2N	4.1865
A8	COc1ccc(cc1)Cn1cnc2c1ncnc2N	4.2237
A9	Fc1ccc(cc1)Cn1cnc2c1ncnc2N	4.1983

On the basis of the results that the QSAR modelling yielded, the SMILES notation descriptors associated with molecular fragments that have a positive impact on the pED₅₀ for both the studied activities and the yield increase in such activities are the following: “C............” – carbon atom or a methyl group, fragment that has a positive effect, whose addition led to the increase in the calculated pED₅₀ values for molecule A1, when compared to the calculated pEC₅₀ values for template molecule A; “C...C.......” – two connected carbon atoms or ethyl group, fragment possessing a positive effect, whose addition led to the increase in the calculated pIC₅₀ values for molecule A2, when compared to the calculated pED₅₀ values for the template molecule; “C…(…c…”, “C…(…”, “c...1...(...”, “ c...C.......” – associated with the addition of at least one methyl group to benzene, leading to the molecule’s branching; “N...........” – nitrogen atom, “N...C.......” – amino group, “N…c…” – nitrogen atom bonded to benzene, “C…N…(…” – branching on the amino group, are fragments found in molecules A3, A5 and A6, molecules that have a primary, secondary and tertiary amino group, respectively, all with higher calculated values for pEC₅₀ when compared to template molecule A; “O...........” – oxygen atom, “O...c...1...”, “c...O.......” and “c...O...C...” – fragments indicating that the hydroxyl/methoxy group is bounded to benzene; molecules A7 and A8 have an added hydroxyl/methoxy group in the para position, and the calculated values for pED₅₀ for the studied activity are higher when compared to molecule A; “F..........”, “HALO10000000”, “++++F–N===” – SMILES descriptors indicating that the molecule has a fluorine atom, as well as a combination of a fluorine and nitrogen atom, “c...F......” – SMILES descriptors indicating that the fluorine atom is bound to the benzene ring; molecule A9 has an added fluorine atom in the para position and the calculated values for pED₅₀ are higher when compared to molecule A.

Molecular docking studies were conducted on all the designed molecules and template molecule A, with both studied enzymes in order to evaluate the predictability of the developed QSAR models and further validate the said models. Table 5 cites all the numerical values for the calculated “scoring” functions. Various scoring functions could be associated with various ligand-amino acids interactions. For this reason, when the inhibitory potency assessment is made, all needs to be taken into consideration. The MolDock and ReRank “score” function results showed that the molecules that potentially possess the highest inhibitory activity are molecules A6 and A7, in the case of the studied enzyme, and the QSAR modelling results support this. It has been proven that template molecule A is the molecule with the lowest MolDock and ReRank “score” function values, which has a good correlation with the QSAR modelling results. Literature [34, 51] outlines detailed definitions of other “scoring” functions, in addition to the type of the potential impact these may have on inhibitory activity. Moreover, the aforementioned “scoring” function values could be used to perform similar analyses. The Supplementary Information section Figures contain all the interactions between the amino acids originating from the dopamine transporter active site and the selected molecules. They also exhibit a 2D representation of hydrogen bonds and the hydrophilic and hydrophobic interactions within the binding pocket. Figure 3 contains the best-calculated poses of all the designed molecules within the human voltage-gated sodium channel, the brain isoform active site.

Table 5

Score values (kcal/mol) for all computer-aided designed compounds
Molecule	HBond	NoHBond	Steric	VdW	Energy	MolDock Score	Rerank Score
A	-6.73837	-6.81011	-94.4714	-23.5086	-93.8656	-95.6522	-69.0745
A1	-4.67594	-5.87713	-98.0504	-15.3582	-94.2286	-97.2073	-69.9425
A2	-5.57927	-6.41380	-102.057	-16.1269	-100.603	-102.689	-74.6128
A3	-5.25352	-6.33121	-106.478	-12.9107	-105.119	-107.437	-75.1959
A4	-10.0777	-10.3034	-96.2437	-22.5016	-100.270	-101.815	-78.7717
A5	-9.4656	-10.4304	-101.065	-24.8663	-104.107	-106.331	-82.3626
A6	-4.89265	-6.69276	-106.761	2.68538	-113.635	-114.811	-75.2484
A7	-7.9973	-8.71823	-96.8022	-23.2328	-98.1076	-100.258	-77.3707
A8	-5.21338	-6.15364	-101.998	-17.6925	-98.5957	-100.900	-74.2967
A9	-4.55001	-6.17604	-99.1365	-16.7948	-95.4206	-98.0054	-71.3171

[Figure 3]

Ensuring that new compounds possess physical and chemical features to classify them as potential therapeutics represents one of the first steps in the early development stages of drugs. A molecule needs to exhibit the following key features in order to be classified as drug-like: good absorption/permeation, gastrointestinal absorption and brain penetration, oral bioavailability, optimal bioavailability and the efficacy of binding to receptors/channels. These are the features which the molecule structure can predict. The SwissADME web service was used during the research withn the aim of calculating the physicochemical descriptors and predicting pharmacokinetic properties, medicinal chemistry friendliness, ADME parameters, as well as the drug-like nature of the designed molecules in order to support drug discovery [51]. Table S3 (Supplementary Material) presents the results obtained, and these results indicate that all the designed molecules have high drug-likeness.

The chief aim of this research is developing robust QSAR models that exhibit good predictability, determined by employing various statistical parameters, for the purpose of dopamine transport inhibition. The Monte Carlo optimization method was used to calculate the conformation independent models, developed according to the optimal descriptors, and derived from the SMILES notation invariants and a local graph. A QSAR model was generated from the pool of vast 0D, 1D, and 2D molecule descriptors with multiple linear regressions and a genetic algorithm. The assessment of the developed QSAR models’ predictive potential and robustness was realized through the application of a range of statistical techniques. The high applicability of the developed QSAR models can be seen in the realized numerical values utilized to validate them. The Monte Carlo optimization method was successful in determining the molecular fragments used in QSAR modeling as the SMILES notation fragments possessing a positive and negative effect on dopamine transport inhibition. Afterward, these were used to perform the computer-aided design of new compounds with higher pIC₅₀ values. The final validators of the developed QSAR model and the potential inhibitory effect of the designed molecules were the molecular docking studies. The obtained results indicate good inter-correlation. The conclusion was made based on the comparison between the calculated pIC₅₀ from the best developed QSAR model and the one from the calculated energies originating from the interactions between the selected molecules and the amino acids in the dopamine transporter’s active site. The purpose of calculating the designed molecules’ physicochemical descriptors was to predict ADME parameters, pharmacokinetic properties, medicinal chemistry friendliness and drug-like nature with a view to supporting drug discovery. The results have shown that all the designed molecules exhibit a high drug-likeness and a high bioavailability. In brief, the methodology presented in this research can be utilized for the discovery of new therapeutics to treat epilepsy.

Ethical Approval

Not applicable

Competing interests

We have no conflict of interest to disclose.

Authors' contributions

B.Ž. conceived of the presented idea. B.Ž., J.S and J.Ž. devised the project, the main conceptual ideas and proof outline. B.Ž. and A.M.V. conceived the study and were in charge of overall direction and planning. A.M.V. and D.S. designed the model and the computational framework and analysed the data. J.S., J.Ž., L.Ž. and A.Ž. carried out the implementation. L.Ž., A.Ž., M.S. and M.L. performed the numerical calculations for the suggested experiment. M.S. and M.L. prepared figures 1-3. J.S., J.Ž., L.Ž. and A.Ž. wrote the manuscript with input from all authors. All authors discussed the results and contributed to the final manuscript.

Funding

This research was partiality funded by the Ministry of Education, Science and Technological Development of Republic of Serbia (Grant No: 451-03-47/2023-01/200113)

Availability of data and materials

The data supporting the findings of this study are available within the article and its supplementary materials.

Acknowledgments

This work is supported by the Ministry of Education and Science, the Republic of Serbia and the Faculty of Medicine, University of Niš, Republic of Serbia (Internal project No. 70). The authors would like to thank the Ministry of Education, Science and Technological Development of Republic of Serbia (Grant No: 451-03-47/2023-01/200113) for financial support.

Marini C, Giardino M (2022) Novel treatments in epilepsy guided by genetic diagnosis. Brit J Clin Pharmaco 88:2539-2551.
McCormick DA, Contreras D (2001) On the cellular and network bases of epileptic seizures. Annu Rev Physiol 63:815-846.
Taylor I, Scheffer IE, Berkovic SF (2003) Occipital epilepsies: identification of specific and newly recognized syndromes. Brain 126:753-769.
Duncan JS, Sander JW, Sisodiya SM, Walker MC (2006) Adult epilepsy. Lancet 367:1087-1100.
Miller JM, Kustra RP, Vuong A, Hammer AE, Messenheimer JA (2008) Depressive symptoms in epilepsy: prevalence, impact, aetiology, biological correlates and effect of treatment with antiepileptic drugs. Drugs 68:1493-1509.
Thijs RD, Surges R, O'Brien TJ, Sander JW (2019) Epilepsy in adults. Lancet, 16, 689-701.
Fisher RS, van Emde Boas W, Blume W, Elger C, Genton P, Lee P, Engel Jr J (2005) Epileptic seizures and epilepsy: definitions proposed by the International League Against Epilepsy (ILAE) and the International Bureau for Epilepsy (IBE). Epilepsia 46:470-472.
Litt B, Echauz J (2002) Prediction of epileptic seizures. Lancet Neurol 1:22-30.
Marson AG, Kadir ZA, Hutton JL, Chadwick DW (1997) The new antiepileptic drugs: a systematic review of their efficacy and tolerability. Epilepsia 38:859-880.
Rogawski MA, Porter RJ (1990) Antiepileptic drugs: pharmacological mechanisms and clinical efficacy with consideration of promising developmental stage compounds. Pharm Rev 42:223-286.
Moshé SL, Perucca E, Ryvlin P, Tomson T (2015) Epilepsy: new advances. Lancet 385:884-898.
Lou S, Cui S (2022) Drug Treatment of Epilepsy: From Serendipitous Discovery to Evolutionary Mechanisms. Curr Med Che, 29:3366-3391.
Łuszczki JJ (2009) Third-generation antiepileptic drugs: mechanisms of action, pharmacokinetics and interactions. Pharmacol Rep 61:197-216.
Perucca P, Gilliam FG (2012) Adverse effects of antiepileptic drugs. Lancet Neurol 11:792-802.
Kwan P, Brodie MJ (2001) Neuropsychological effects of epilepsy and antiepileptic drugs. Lancet 357:216-222.
Bialer M, Johannessen SI, Kupferberg HJ, Levy RH, Perucca E, Tomson T (2004) Progress report on new antiepileptic drugs: a summary of the Seventh Eilat Conference (EILAT VII). Epilepsy Res 61:1-48.
Alachkar A, Ojha SK, Sadeq A, Adem A, Frank A, Stark H, Sadek B (2020) Experimental Models for the Discovery of Novel Anticonvulsant Drugs: Focus on Pentylenetetrazole-Induced Seizures and Associated Memory Deficits. Curr Pharm Des 26:1693-1711.
White HS (1999) Comparative anticonvulsant and mechanistic profile of the established and newer antiepileptic drugs. Epilepsia 40(Suppl 5):S2-10.
Rogawski MA, Porter RJ (1990) Antiepileptic drugs: pharmacological mechanisms and clinical efficacy with consideration of promising developmental stage compounds. Pharmacol Rev 42:223-286.
Staley K (2015) Molecular mechanisms of epilepsy. Nat Neurosci 18:367-372.
Perucca E (1996) The new generation of antiepileptic drugs: advantages and disadvantages. Br J Clin Pharmacol 42:531-543.
Loscher W, Schmidt D (1994) Strategies in antiepileptic drug development: is rational drug design superior to random screening and structural variation? Epilepsy Res 17:95-134.
Ekins S, Mestres J, Testa B (2007) In silico pharmacology for drug discovery: methods for virtual ligand screening and profiling. Br J Pharmacol 152:9-20.
Tabeshpour J, Sahebkar A, Zirak MR, Zeinali M, Hashemzaei M, Rakhshani S (2018) Computer-aided Drug Design and Drug Pharmacokinetic Prediction: A Mini-review. Curr Pharm Design 24:3014-3019.
Bhutoria S, Ghoshal N (2008) A Novel Approach for the Identification of Selective Anticonvulsants Based on Differential Molecular Properties for TBPS Displacement and Anticonvulsant Activity: An Integrated QSAR Modelling. QSAR Comb Sci 27:876-889.
Macchiarulo A, De luca L, Costantino G, Barreca ML, Gitto R, Pellicciari R, Chimirri A (2004) QSAR study of anticonvulsant negative allosteric modulators of the AMPA receptor. J Med Chem 47:1860-1863.
Yousefi J, Sajjadi SM, Bagheri A (2022) Predicting the Anticonvulsant Activities of Phenylacetanilides Using Quantitative-structure-activity-relationship and Artificial Neural Network Methods. Anal Bioanal Chem Res 9:331-339.
Garro Martinez JC, Vega-Hissi EG, Andrada MF, Estrada MR (2015) QSAR and 3D-QSAR studies applied to compounds with anticonvulsant activity. Expert Opin. Drug Dis 10:37-51.
Bellera CL, Talevi A (2019) Quantitative structure-activity relationship models for compounds with anticonvulsant activity. Expert Opin Drug Dis 14:653-665.
Pedro JB, John BOM (2010) A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking. Bioinformatics 26:1169-1175.
Oluwaseye A, Uzairu A, Shallangwa GA, Abechi SE (2017) A novel QSAR model for designing, evaluating, and predicting the antiMES activity of new 1H-pyrazole-5-carboxylic acid derivatives. JOTCSA 4:739-774
Ojha PK, Roy K (2011) Comparative QSARs for antimalarial endochins: importance of descriptor-thinning and noise reduction prior to feature selection. Chemometr Intell Lab 109:146-161.
Toropov AA, Duchowicz P, Castro EA (2003) Structure–Toxicity Relationships for Aliphatic Compounds Based on Correlation Weighting of Local Graph Invariants. Int J Mol Sci 4:s272-283.
Veselinović AM, Veselinović JB, Živković JV, Nikolić GM (2015) Application of SMILES Notation Based Optimal Descriptors in Drug Discovery and Design. Curr Top Med. Chem 15:1768-1779.
Zivkovic M, Zlatanovic M, Zlatanovic N, Golubović M, Veselinović AM (2020) The Application of the Combination of Monte Carlo Optimization Method based QSAR Modeling and Molecular Docking in Drug Design and Development. Mini-Rev Med Chem 20:1389-1402.
Golbraikh A, Tropsha A (2002) Beware of q2! J Mol Graph Model 20:269-276.
Roy PP, Leonard JT, Roy K (2008) Exploring the impact of size of training sets for the development of predictive QSAR models. Chemometr Intell Lab 90:31-42.
Ojha PK, Mitra I, Das RN, Roy K (2011) Further exploring rm2 metrics for validation of QSPR models. Chemometr Intell Lab 107:194-205.
Roy K, Das RN, Ambure P, Aher RB (2016) Be aware of error measures. Further studies on validation of predictive QSAR models. Chemometr Intell Lab 152:18-33.
Toropova AP, Toropov AA (2017) The index of ideality of correlation: A criterion of predictability of QSAR models for skin permeability? Sci Total Environ 586:466-472.
Gadaleta D, Mangiatordi GF, Catto M, Carotti A, Nicolotti O (2016) Applicability domain for QSAR models: where theory meets reality. IJQSPR 1:45-63.
Gramatica P (2007) Principles of QSAR models validation: internal and external. QSAR Comb Sci 26:694-701.
Toropov AA, Toropova AP, Lombardo A, Roncaglioni A, Benfenati E, Gini G (2011) CORAL: Building up the model for bioconcentration factor and defining it’s applicability domain. Eur J Med Chem 46:1400-1403.
Thomsen R, Christensen MH (2006) MolDock: a new technique for high-accuracy molecular docking. J Med Chem 49:3315-3321.
Manisha, Chauhan S, Kumar P, Kumar A (2019) Development of prediction model for fructose-1,6-bisphosphatase inhibitors using the Monte Carlo method. SAR QSAR Environ Res 30:145-159.
Halder A (2018) Finding the structural requirements of diverse HIV-1 protease inhibitors using multiple QSAR modelling for lead identification. SAR QSAR Environ Res 29:911-933.
Kumar P, Kumar A, Sindhu J (2019) In silico design of diacylglycerol acyltransferase-1 (DGAT1) inhibitors based on SMILES descriptors using Monte-Carlo method. SAR QSAR Environ Res 30:525-541.
Ahmadi S, Lotfi S, Afshari S, Kumar P, Ghasemi E (2021) CORAL: Monte Carlo based global QSAR modelling of Bruton tyrosine kinase inhibitors using hybrid descriptors. SAR QSAR Environ Res 32:1013-1031.
Ahmadi S, Lotfi S, Kumar P (2022) Quantitative structure-toxicity relationship models for predication of toxicity of ionic liquids toward leukemia rat cell line IPC-81 based on index of ideality of correlation. Toxicol Mech Methods 32:302-312.
Lotfi S, Ahmadi S, Kumar P (2021) The Monte Carlo approach to model and predict the melting point of imidazolium ionic liquids using hybrid optimal descriptors. RSC Adv 11:33849-33857
Daina A, Michielin O, Zoete V (2017) SwissADME: a free web tool to evaluate pharmacokinetics, drug-likeness and medicinal chemistry friendliness of small molecules. Sci Rep 7:42717.

Table 3 is available in Supplementary Files section.

No competing interests reported.

Monte Carlo optimization based QSAR modeling, molecular docking studies and ADMET predictions of compounds with antiMES activity

Status:

Journal Publication

Version 1

Abstract

Figures

1. Introduction

2. Materials and Methods

2.1. Molecule Data Set

2.2. The Monte Carlo Optimization Method

2.3. Validation of the Developed QSAR Models

2.4. Molecular Docking

3. Results and Discussion

4. Conclusion

Declarations

References

Table 3

Additional Declarations

Supplementary Files

Status:

Journal Publication

Version 1