Visualization of chemical space
This study involved high-dimensional datasets containing hundreds of molecular fingerprints and descriptors. The PCA algorithm was applied to reduce the chemical space into two-dimensions. The chemical space representations for the ECFP, RDKit fingerprints, molecular descriptors and combination of ECFP with molecular descriptors produced using the PCA algorithm are shown in Fig. 1.
In chemical space visualisation, structural analogues are positioned nearer to each other than to unrelated compounds [26]. This allows clustering techniques, such as PCA, do identify neighbourhoods with similarly structured molecules [26]. Thus, some degree of clustering was expected to be observed between active compounds.
Among the single descriptor types, Fig. 1(a-d), the highest degree of clustering between active molecules was observed in the chemical space visualisation of the molecular descriptors. An explanation is that the chemical fingerprints used in this study were hashed fingerprints. Hashed fingerprints often involve loss of information due to bit collisions, thus, the distances between the fingerprints may not perfectly correlate to the similarity of the compounds [27]. Interestingly, the chemical space visualisation of the combined feature type, Fig. 1e, is almost identical to that of the molecular descriptors shown in Fig. 1d. This indicates that the molecular descriptors have a stronger expressive power than the ECFPs of 1,024-bit length for the chemical space analysis of the DrugAge database.
Feature selection
Feature selection was employed to select the most relevant features for predicting the activity of a molecule in the database. This was performed only for the training set which contained 80% of compounds in the dataset. Feature selection was achieved by applying variance and mutual information-based pre-selection methods. This reduced the number of features used by each model, making computational calculations less expensive. The median AUC scores and standard deviation of 10-fold cross-validation obtained by random forest classification for each feature combination can be found in Supplementary Table 1, Additional File 1. For each descriptor type, the feature combination with the highest AUC score in 10-fold cross-validation was selected for classifying the compounds in the test set. In cases where two feature combinations achieved the same AUC score, the combination that had the smallest standard deviation was used.
Model Selection
The test set contained 20% of the data not used in training the models. The performances of the random forest classifiers on 10-fold cross-validation and on classifying the compounds in the test set are shown in Table 1.
Table 1
Model | Number of Selected Features | Cross-Validation (AUC \(\pm\)stdev) | Test Set (AUC) |
ECFP_1024 | 55 | 0.794 \(\pm\) 0.048 | 0.793 |
ECFP_2048 | 504 | 0.789 \(\pm\) 0.042 | 0.776 |
RDKit5 | 654 | 0.836 \(\pm\) 0.053 | 0.777 |
MD | 69 | 0.823 \(\pm\) 0.041 | 0.815 |
ECFP_1024_MD | 33 | 0.828 \(\pm\) 0.040 | 0.806 |
As illustrated in Fig. 2, the predictive performances of the random forest models did not significantly drop for classifying the compounds in the test set and were compatible with the spread of the AUC scores from cross-validation. This indicated that overfitting was minimised.
The receiver operating characteristic (ROC) curve is the plot of the True Positive Rate (TPR) against the False Positive Rate (FPR) at varying classification thresholds. The ROC curves, displayed in Fig. 3, compare the performances of the descriptor types for classifying the samples of the test set. Analysis of the ROC curves indicates that the five random forest models performed better than a random prediction.
The best performing model, selected by its ability to correctly classify the compounds in the test set, was used for predicting the class of the compounds in the screening dataset. In general, the random forest models with a smaller number of selected features, such as ECFP_1024, MD and ECFP_1024_MD, had better performances on the test set. The classifier built using only molecular descriptor, the MD model, had the greatest ability to correctly predict the class of the compounds in the test set. Combining MD with ECFP_1024, the random forest model with the second-highest predictive ability, did not result in higher performance. The ECFP_1024 features could have provided additional information that was not useful to the random forest classifier making the predictions more difficult. Therefore, the MD model, which had an AUC score of 0.815 for classifying the compounds in the test set, was selected for further analysis.
Confusion matrix
The confusion matrix of the MD random forest model for predicting the class of the molecules in the test set is shown in Fig. 4. The classification accuracy of the model was 0.853 and the AUC score was 0.815.
The calculation of the Positive Predictive Value (PPV), Eq. 1, and Negative Predictive Value (NPV), Eq. 2, is shown below:
In binary classification, the PPV and NPV are the percentage of positive and negative values, respectively, that are correctly classified. Herein, the PPV and NPV indicate that the random forest model performed better on correctly classifying inactive compounds than active ones. The data used in this study was imbalanced as approximately 79% of the samples were negative entries. Thus, a random prediction that a compound is inactive had a much higher initial probability of being correct. To handle the imbalanced data, the “class_weight” argument of the random forest algorithm was set to “balanced”, which penalises misclassification of the minority class [29]. This improved the performance of the model, as the PPV for classifying the compounds of the test set increased from 61.1% (value without balancing the class weights) to 65.6% (score achieved after balancing the class weights).
Feature importance
In this experiment, the feature relevance was measured using the “Gini importance” of the random forest algorithm. The selected model, MD, was composed of 69 molecular descriptors calculated by the MOE™ software [30]. The table containing the full feature ranking can be found in Additional File 2. The analysis was focused on the top 30 features with the highest Gini importance (Table 2), which contained both 2D and 3D molecular descriptors.
Table 2
Top ranking MD descriptors.
Gini importance | Feature | Description |
0.062 | a_nN | Number of nitrogen atoms |
0.029 | PEOE_VSA + 2 | Total positive van der Waals surface area of atoms with a partial charge in the range of 0.10 to 0.15 |
0.026 | vsurf_D8 | Hydrophobic volume |
0.024 | h_pKa | The pKa of the reaction that removes a proton |
0.023 | SMR_VSA6 | Sum of van der Waals surface areas such that the molar refractivity contribution is in the range of 0.485 to 0.560 |
0.023 | rsynth | A value in [0,1] indicating the synthetic reasonableness, or feasibility, of the chemical structure. A value of 0 means it is unlikely that the molecule can be synthesized while a value of 1 means that it is likely that the molecule can be synthesized. The value reflects the fraction of heavy atoms in the molecule that can be traced back to starting materials fragments resulting from retrosynthetic disconnection rules. |
0.022 | PEOE_VSA-4 | Total positive van der Waals surface area of atoms with a partial charge in the range of -0.25 to -0.20 |
0.021 | PEOE_VSA + 4 | Total positive van der Waals surface area of atoms with a partial charge in the range of 0.20 to 0.25 |
0.021 | PEOE_VSA-6 | Total positive van der Waals surface area of atoms with a partial charge that is less than − 0.30 |
0.021 | PEOE_VSA_PPOS | Total positive van der Waals surface area of atoms with a partial charge that is greater than 0.20 |
0.020 | chi0_C | Carbon connectivity index (order 0) |
0.020 | Q_VSA_PNEG | Total negative polar van der Waals surface area of atoms of with a partial charge that is less than − 0.20 |
0.020 | PEOE_VSA_POL | Total polar van der Waals surface area of atoms of which the absolute value of their partial charge is greater than 0.20 |
0.020 | chi0v_C | Carbon valence connectivity index (order 0) |
0.019 | SMR_VSA3 | Sum of van der Waals surface areas such that the molar refractivity contribution is in the range of 0.35 to 0.39 |
0.019 | Q_VSA_PPOS | Total positive van der Waals surface area of atoms with a partial charge that is greater than 0.20 |
0.018 | b_single | Number of single bonds |
0.018 | a_count | Number of atoms |
0.018 | SlogP_VSA3 | Sum of van der Waals surface areas such that the logP(o/w) is in the range of 0.0 to 0.1 |
0.018 | PEOE_VSA_PNEG | Total negative polar van der Waals surface area of atoms of with a partial charge that is less than − 0.20 |
0.017 | TPSA | Topological polar surface area |
0.017 | zagreb | Zagreb index |
0.017 | weinerPol | Wiener polarity number |
0.017 | opr_brigid | The number of rigid bonds |
0.017 | Kier3 | Third kappa shape index |
0.016 | PEOE_VSA-1 | Total positive van der Waals surface area of atoms with a partial charge in the range of -0.10 to -0.05 |
0.016 | chi0 | Atomic connectivity index (order 0) |
0.016 | Kier2 | Second kappa shape index |
0.016 | SlogP_VSA2 | Sum of van der Waals surface areas such that the logP(o/w) is in the range of -0.2 to 0.0 |
0.015 | a_nH | Number of hydrogen atoms |
Top 30 features ranked by Gini importance for the MD random forest model. The description of the features was taken from the MOE™ software documentation [30].
The highest-ranking features were broadly separated into the following categories (i) atom and bond counts (ii) topological and (iii) partial charge descriptors.
Atom and bond counts are simple descriptors that do not provide any information on molecular geometry or atom connectivity. The highest-ranking atom and bond count descriptors were a_nN, b_single, a_count, opr_brigid, and a_nH. While very simplistic, the atom and bond counts outperformed other more complex molecular descriptors. This is because atom and bond counts can partially capture the overall properties of a compound such as size, hydrogen bonding and polarity, which often impact the activity of a drug [31]. The number of nitrogen atoms, a_nN, was the top-ranking feature of the MD random forest model with a Gini importance score of 0.062. This is consistent with the results of Barardo et al. (2017) where a_nN was also ranked highest for predicting the class of the compounds in the DrugAge database [4]. Nitrogen atoms could have affected the physicochemical properties of the drugs as well as the interactions and binding of the molecules with target residues.
The highest-ranking topological descriptors included chi0_C, chi0v_C, zagreb, weinerPol, Kier3, chi0 and Kier2. Topological descriptors take into account atom connectivity. The descriptors are computed from molecular graphs, where atoms are represented by vertices and the bonds by edges [32]. These descriptors can provide information on the degree branching of the structure as well as molecular size and shape [32]. Although topological descriptors are extensively used in predictive modelling, they are usually hard to interpret [33]. Topological descriptors may have provided information on how well a molecule fits in the binding site and along with atom counts the interactions with the binding residues.
Top ranking partial charge descriptors were PEOE_VSA + 2, PEOE_VSA-4, PEOE_VSA + 4, PEOE_VSA-6, PEOE_VSA_PPOS, Q_VSA_PNEG, PEOE_VSA_POL, Q_VSA_PPOS and PEOE_VSA_PNEG. The “PEOE_” prefix denotes descriptors calculated using the partial equalization of orbital electronegativity (PEOE) algorithm for quantification of partial charges in the \(\sigma -\)system [34, 35]. On the other hand, descriptors prefixed with “Q_” were calculated using the Amber10:EHT force field [30]. In a ligand-receptor system, partial charges can play a key role in the binding properties of the molecule as well as molecular recognition.
Predicting potential lifespan-extending compounds
The MD random forest model was applied to predict the class compounds in an external database, consisting of 1,738 small-molecules obtained from the DrugBank database [36]. The top-ranking compounds with a predictive probability of \(\ge 0.80\) for increasing the lifespan of C. elegans are shown in Table 3. The full ranking of the molecules in the screening database can be found in Additional File 2. The compounds were broadly separated into the following categories; (i) flavonoids, (ii) fatty acids and conjugates, and (iii) organooxygen compounds. The compound classification was taken from the category “Class” in the chemical taxonomy section of the DrugBank database (provided by Classyfire) or assigned manually if not available [37].
Table 3
Top-hit compounds from external database.
Compound name | Predictive probability |
Diosmin | 0.96 |
Gamolenic acid | 0.95 |
Rutin | 0.95 |
Hesperidin | 0.94 |
Lactose | 0.89 |
6''-O-Malonyldaidzin | 0.84 |
Fidaxomicin | 0.84 |
Sucrose | 0.83 |
Lactulose | 0.83 |
Sodium aurothiomalate | 0.82 |
Aloin | 0.81 |
Rifapentine | 0.81 |
Plecanatide | 0.80 |
Calcifediol | 0.80 |
Chlortetracycline | 0.80 |
Chemical compounds from the screening database with a predictive probability of 0.80 or above for increasing the of C. elegans.
Flavonoids
Flavonoids are a group of secondary metabolites in plants that are common polyphenols in the human diet [38]. Major nutritional sources include tea, soy, fruits, vegetables, wine and nuts [38, 39]. Flavonoids are separated into subclasses based on their chemical structure, including flavones, flavonols, flavanones, and isoflavones [38]. Isoflavones differ to other flavonoids by having ring B attached to C-3 position of ring C, rather than the C-2 position as shown in Fig. 5 [38].
Flavonoids have been associated with health benefits for age-related conditions such as metabolic diseases, cancer, inflammation and cognitive decline [38, 39]. Possible mechanisms of action include antioxidant activity, scavenging of radicals, central nervous system effects, alteration of the intestinal transport, sequestration and processing of fatty acids, PPAR activation and increase of insulin sensitivity [38].
Diosmin was the top-hit molecule in the screening database, with a predictive probability of 0.96. Diosmin is a flavonol glycoside that is either extracted from plants such as Rutaceae or obtained synthetically [40]. It has anti-inflammatory, free radical scavenging, and anti-mutagenic properties and has been used medically to treat pain and bleeding of haemorrhoids, chronic venous disease and lymphedema [41]. Nevertheless, diosmin has a poor aqueous solubility, which is a challenge for oral administration [42]. Kamel et al. (2017) found that a combination of diosmin with essential oils showed skin antioxidant, anti-ageing and sun-blocking effects on mice [42]. The underlying mechanisms for diosmin’s anti-ageing and photo-protective effects include enhancing lymphatic drainage, ameliorating capillary microcirculation inflammation and preventing leukocyte activation, trapping, and migration [42, 43].
Other flavonoids that ranked high for increasing the lifespan of C. elegans were rutin and hesperidin with a predictive probability of 0.95 and 0.94, respectively. Rutin (or quercetin-3-rutinoside), is a flavonol glycoside that is abundant in many plants such as passionflower, apple, tea, buckwheat seeds and citrus fruits [44, 45]. It possesses a range of biological properties including antioxidant, anticancer, neuroprotective, cardio-protective and skin-regenerative activities [44, 45]. Rutin had a high structural similarity to other flavonoids in the DrugAge database and particularly with quercetin 3-O-β-d-glucopyranoside-(4→1)-β-d-glucopyranoside (Q3M). The Tanimoto coefficient between the RDKit fingerprints of Q3M and rutin was 0.99. The similarity map between the two compounds is shown in Fig. 6.
Q3M is a flavonoid abundant in onion peel that was found to extend the lifespan of C. elegans [47]. In the same study, although rutin was found to improve the tolerance of C. elegans to oxidative stress, which is desirable for longevity, it did not affect the worm's lifespan [47]. Davalli et al. (2016) also reported that rutin did not improve the longevity of C. elegans [48]. On the other hand, Chattopadhyay et al. (2017) showed the rutin promoted longevity in a species of fly, D. melanogaster [45].
Hesperidin has shown reactive oxygen species (ROS) inhibition and anti-ageing effects in the yeast species Saccharomyces cerevisiae [49]. Fernández-Bedmar et al. (2011) found that hesperidin extracted from orange juice had a positive influence on the lifespan of D. melanogaster [50]. Wang et al. (2020) showed that orange extracts, where hesperidin was the predominant phenolic compound, increased the mean lifespan of C. elegans [51]. In the same study, orange extracts were also found to promote longevity by enhancing motility and reducing the accumulation of age pigment and ROS levels [51].
Soy isoflavones include genistein, glycitein, and daidzein. Genistein, a compound of the DrugAge, has been found to prolong the lifespan of C. elegans and increase its tolerance to oxidative stress [52]. Gutierrez-Zepeda et al. (2005) found that C. elegans fed with soy isoflavone glycitein had an improved resistance towards oxidative stress [53]. However, in comparison to control worms, the lifespan of C. elegans fed with glycitein was not significantly affected [53]. The effect of daidzein on the lifespan of C. elegans in the presence of pathogenic bacteria was investigated by Fischer et al. (2012) [54]. The study found that daidzein had an estrogenic effect that which extended the worm’s lifespan in presence of pathogenic bacteria and heat [54]. Herein, we applied the MD random forest model to predict the effect of 6''-O-malonyldaidzin on the lifespan of C. elegans. 6''-O-Malonyldaidzin is an o-glycoside derivative of daidzein found in food products such as soybean, miso, soy milk and soy yoghurt [55]. Its predicted probability for extending the lifespan of the worm was 0.84.
Fatty acids and conjugates
Lipid metabolism has an essential role in many biological processes of an organism. Lipids are used as energy storage in the form of triglycerides and can therefore aid survival under severe conditions [56]. Additionally, lipids have a key role in intercellular and intracellular signalling as well as organelle homeostasis [57]. Research on both invertebrates and mammals suggest that alteration in lipid levels and composition are associated with ageing and longevity [56, 57].
A recent review by Johnson and Stolzing (2019), on lipid metabolism and its role in ageing, lifespan extension and age-related conditions, summarised key lipid-related interventions that promote longevity in C. elegans [58]. Some of the studies presented in that review are reported here. In response to fasting O’Rourke et al. (2013), showed that supplementing C. elegans with the \(\omega\)-6 polyunsaturated fatty acids (PUFAs) arachidonic acid and di-homo‐γ‐linoleic increased the worm’s starvation resistance and prolonged its lifespan by stimulating autophagy [59]. Similarly, Qi et al. (2017), found that treating C. elegans with \(\omega\)-3 PUFA \(\alpha\)-linolenic acid in dose‐dependent manner extended the worm’s lifespan [60]. The study indicated that the \(\omega\)-3 fatty acid underwent oxidation to generate a group of molecules known as oxylipins. The findings suggested that the increase the worm’s lifespan could be a result of the combined effects of the α-linolenic acid and oxylipin metabolites [60]. Sugawara et al. (2013) found that a low dose of fish oils, which contained PUFAs eicosapentaenoic acid and docosahexaenoic acid, significantly increased the lifespan of C. elegans [61]. The authors proposed that a low dose of fish oils induces moderate oxidative stress that extended the lifespan of the organism. In contrast, large amounts of fish oils had a diminishing effect on the worm’s lifespan [61].
Gamolenic acid or\(\gamma\)–linolenic acid (GLA) was the second top-hit molecule of the screening database with a predictive probability of 0.95. GLA is an \(\omega\)-6 PUFA, composed of an 18-carbon chain with three double bonds in the 6th, 9th and 12th position [62]. Rich sources of GLA include evening primrose oil (EPO), black currant oil, and borage oil [63]. In mammals, GLA is synthesized from linoleic acid (dietary) via the action of the enzyme \(\delta\)-6 desaturase [62, 63]. GLA is a precursor for other essential fatty acids such as arachidonic acid [62, 63]. Conditions such as hypertension and diabetes as well as stress and various aspects of ageing, reduce the capacity of \(\delta\)-6 desaturase to convert linoleic acid to GLA [64]. This may lead to a deficiency of long-chain fatty acid derivatives and metabolites of GLA. GLA has been used as a constituent of anti-ageing supplements and has shown to possess various therapeutic effects in humans including improvement of age-related anomalies [62].
Sodium aurothiomalate, with a lifespan increase probability of 0.82, is a thia short-chain fatty acid used for the treatment of rheumatoid arthritis and has potential antineoplastic activities [37, 65]. In preclinical models, sodium aurothiomalate inhibited protein kinase C iota (PKCι) signalling, which is overexpressed in non-small cell lung, ovarian and pancreatic cancers [65]. The chemical structure of sodium aurothiomalate is shown in Fig. 7.
Organooxygen compounds
Lactose, with a lifespan increase probability of 0.89, is a disaccharide found in milk and other dairy product. In the human intestine, lactose is hydrolysed to glucose and galactose by the enzyme lactase. Out of the compounds in the DrugAge database, lactose had the highest structural similarity with trehalose. Trehalose has been found to increase the mean lifespan of C. elegans by over 30%, without showing any side effects [66]. The Tanimoto coefficient between the RDKit fingerprint representations of trehalose and lactose was 0.85. The similarity map generated using ECFP fingerprints is shown in Fig. 8.
Even though lactose has a high (Tanimoto) similarity to trehalose, Xing et al. (2019) found that lactose treatment shortened the lifespan of C. elegans [67].
Sucrose, with a lifespan increase probability of 0.83, is a disaccharide composed of glucose and fructose [68]. It is used as the main form of transporting carbohydrates in fruits and vegetables [68]. Other sugars such as trehalose, galactose and fructose have been found to extend the lifespan of C. elegans [66, 69, 70]. However, Zheng et al. (2017) found the treating C. elegans with sucrose had no significant effect on the organism’s mean lifespan [70]. In rats, sucrose has been found to shorten the mean lifespan and elevate the blood pressure [71]. Rovenko et al. (2015) showed that in D. melanogaster, high sucrose consumption decelerated pupation, increased pupa mortality and promoted obesity [72].
Lactulose, with a lifespan increase probability of 0.83, is a synthetic disaccharide composed of monosaccharides lactose and galactose [72]. Lactulose has been to be an effective treatment for chronic constipation in elderly patients as well as improve the cognitive function in patients with hepatic encephalopathy [72, 73].
Other classes of compounds
Other compounds with a predictive probability ≥ 0.80 for increasing the lifespan of C. elegans included aloin, a constituent of aloe vera with a predictive probability of 0.81, as well as the antibiotics fidaxomicin (predictive probability = 0.84), rifapentine (predictive probability = 0.81) and chlortetracycline (predictive probability = 0.80).
Aloe vera is a well-known plant used in medicine, cosmetics and beverages. It possesses a wide range of biological properties including anti-inflammatory, anticancer, laxative and antioxidant activities as well as promoting the healing process of dermal injuries [74, 75]. Additionally, aloe vera has been associated with improving disorders such as diabetes, microbial diseases, cardiovascular and liver problems [75]. Its biological activities have been attributed to the plethora of phytochemicals present in the aloe vera sap and gel. Various studies have demonstrated that the anthraquinones and glycosides present in the sap have a key role in its anticancer, anti-inflammatory, laxative effects, tyrosinase inhibition, free radical and proliferative activities [48]. Chandrashekara et al. (2011) found that aloe vera supplementation extended that lifespan of D. melanogaster larvae [76]. This effect was attributed to the plethora of chemicals present in aloe vera including proteins, lipids, amino acids and small-molecules. The authors proposed that the aloe vera extract had a similar effect to the worm’s lifespan as resveratrol, including neuroprotection and stimulation of regrowth or repair of nerve fibres [76].
Aloin is a bioactive compound in various Aloe species. It is composed of two diastereoisomers, aloin A, or barbaloin, and aloin B, or isobarbaloin, which have similar chemical properties [55]. Aloin is an anthraquinone glycoside, which is an anthraquinone containing a sugar molecule. Aloin has been used medically as stimulant-laxative, alleviating constipation by triggering bowel movements [55]. In this study, the MD random forest model was applied to predict the effect of aloin A on the lifespan of C. elegans, which had a predictive probability of 0.81. Aloin has been found to possess anti-inflammatory, antiproliferative and anticancer activities as well as protect dermal fibroblasts against oxidative stress damage [77–80]. Experimental testing would be required to further investigate the effect of aloin A on the lifespan of C. elegans.
Rifapentine is a macrolactam antibiotic approved for the treatment of tuberculosis [81]. Macrolactams are a small class of compounds which consist of cyclic amides having unsaturation or heteroatoms replacing one or more carbon atoms in the ring [37]. Macrolactams such as rifampicin and rifamycin have been found to increase the lifespan of C. elegans [82].
Advanced glycation end (AGE) products are formed from the non-enzymatic reaction of sugars, such as glucose, with proteins, lipids or nucleic acids [82]. AGE products have been implicated in ageing and age-related diseases such as diabetes, atherosclerosis, and neurodegenerative [82]. Golegaonkar et al. (2015) showed that rifampicin reduced AGE products and extended the mean lifespan of C. elegans by 60% [82]. The effect of two other macrolactams, rifamycin SV and rifaximin, on the worm’s lifespan was also investigated. Rifamycin SV was found to exhibit similar activity to rifampicin, while rifaximin lacked anti-glycating activity and did not extend the lifespan of C. elegans. The authors suggested that the anti-glycation properties of rifampicin and rifamycin could be attributed to the presence of a para-dihydroxyl moiety, which was not present in rifaximin [82]. As shown in Fig. 9, this functional group is also present in rifapentine. Experimental testing would be required to investigate whether rifapentine possess similar properties to rifampicin and rifamycin.
Evaluation of the chemical similarity principle
Several of the compounds identified by the random forest model had already been experimentally evaluated for increasing the lifespan of C. elegans and other model organisms. In particular, the RDKit fingerprints of rutin are 0.99 (Tanimoto) similar to that of Q3M, an active compound. However, experimental studies found that although it is structurally similar to active compounds, rutin does not extend the lifespan of C. elegans [47, 48]. Additionally, the Tanimoto coefficient between the RDKit fingerprint representations of lactose and trehalose, an active compound, is 0.85. Nevertheless, in vivo studies showed that treatment with lactose reduced the lifespan of C. elegans [67]. In these cases, the chemical similarity principle, which states that chemically similar compounds tend to have similar bioactivities, appears to be invalid. An explanation presented by Martin et al. (2002) is that protein structures are complex and flexible systems [83]. Thus, structurally similar chemicals may bind in different orientations to the active site, interact with a different conformation of the protein or even bind to completely different proteins [83].