Investigating which features are considered “more important” by black-box models such as random forest can aid understanding of how these models make predictions. In this experiment, the feature relevance was measured using the “Gini importance” of the random forest algorithm. The selected model, MD, was composed of 69 molecular descriptors calculated by the MOE™ software22. The table containing the full feature ranking can be found in Additional File 2. The analysis was focused on the top 30 features with the highest Gini importance (Table 1), which contained both 2D and 3D molecular descriptors.
The highest-ranking features were broadly separated into the following categories (i) atom and bond counts (ii) topological and (iii) partial charge descriptors.
Atom and bond counts are simple descriptors that do not provide any information on molecular geometry or atom connectivity. The highest-ranking atom and bond count descriptors were a_nN, b_single, a_count, opr_brigid, and a_nH. While very simplistic, the atom and bond counts outperformed more complex 2D and 3D molecular descriptors. This is because atom and bond counts can partially capture the overall properties of a compound such as size, hydrogen bonding and polarity, which often impact the activity of a drug23. The number of nitrogen atoms, a_nN, was the top-ranking feature of the MD random forest model with a Gini importance score of 0.062. This is consistent with the results of Barardo et al. (2017) where a_nN was also ranked highest for predicting the class of the compounds in the DrugAge database4. Nitrogen atoms could have affected the physicochemical properties of the drugs as well as the interactions and binding of the molecules with target residues.
The highest-ranking topological descriptors included chi0_C, chi0v_C, zagreb, weinerPol, Kier3, chi0 and Kier2. Topological descriptors take into account atom connectivity. The descriptors are computed from molecular graphs, where atoms are represented by vertices and the bonds by edges24. These descriptors can provide information on the degree branching of the structure as well as molecular size and shape24. Although topological descriptors are extensively used in predictive modelling, they are usually hard to interpret25. Topological descriptors may have provided information on how well a molecule fits in the binding site and along with atom counts the interactions with the binding residues.
Top ranking partial charge descriptors were PEOE_VSA+2, PEOE_VSA-4, PEOE_VSA+4, PEOE_VSA-6, PEOE_VSA_PPOS, Q_VSA_PNEG, PEOE_VSA_POL, Q_VSA_PPOS and PEOE_VSA_PNEG. The “PEOE_” prefix denotes descriptors calculated using the partial equalization of orbital electronegativity (PEOE) algorithm for quantification of partial charges in the σ -system26,27. On the other hand, descriptors prefixed with “Q_” were calculated using the Amber10:EHT force field22. In a ligand-receptor system, partial charges can play a key role in the binding properties of the molecule as well as molecular recognition.
Predicting potential lifespan-extending compounds
The MD random forest model was applied to predict the class compounds in an external database, consisting of 1,738 small-molecules obtained from the DrugBank database28. The top-ranking compounds with a predictive probability of ≥ 0.080 for increasing the lifespan of C. elegans are shown in Table 2. The full ranking of the molecules in the screening database can be found in Additional File 2. The compounds were broadly separated into the following categories; (i) flavonoids, (ii) fatty acids and conjugates, and (iii) organooxygen compounds. The compound classification was taken from the category “Class” in the chemical taxonomy section of the DrugBank database (provided by Classyfire) or assigned manually if not available29.
Table 2 Chemical compounds from the screening database with a predictive probability of 0.80 or above for increasing the of C. elegans.
Compound name
|
Predictive probability
|
Diosmin
|
0.96
|
Gamolenic acid
|
0.95
|
Rutin
|
0.95
|
Hesperidin
|
0.94
|
Lactose
|
0.89
|
6''-O-Malonyldaidzin
|
0.84
|
Fidaxomicin
|
0.84
|
Sucrose
|
0.83
|
Lactulose
|
0.83
|
Sodium aurothiomalate
|
0.82
|
Aloin
|
0.81
|
Rifapentine
|
0.81
|
Plecanatide
|
0.80
|
Calcifediol
|
0.80
|
Chlortetracycline
|
0.80
|
Flavonoids
Flavonoids are a group of secondary metabolites in plants that are common polyphenols in the human diet30. Major nutritional sources include tea, soy, fruits, vegetables, wine and nuts30,31. Flavonoids are separated into subclasses based on their chemical structure, including flavones, flavonols, flavanones, and isoflavones30.
Flavonoids have been associated with health benefits for age-related conditions such as metabolic diseases, cancer, inflammation and cognitive decline 30,31. Possible mechanisms of action include antioxidant activity, scavenging of radicals, central nervous system effects, alteration of the intestinal transport, sequestration and processing of fatty acids, PPAR activation and increase of insulin sensitivity 30.
Diosmin was the top-hit molecule in the screening database, with a predictive probability of 0.96. Diosmin is a flavonol glycoside that is either extracted from plants such as Rutaceae or obtained synthetically 32. It has anti-inflammatory, free radical scavenging, and anti-mutagenic properties and has been used medically to treat pain and bleeding of haemorrhoids, chronic venous disease and lymphedema33. Nevertheless, diosmin has a poor aqueous solubility, which is a challenge for oral administration34. Kamel et al. (2017) found that a combination of diosmin with essential oils showed skin antioxidant, anti-ageing and sun-blocking effects on mice 34. The underlying mechanisms for diosmin’s anti-ageing and photo-protective effects include enhancing lymphatic drainage, ameliorating capillary microcirculation inflammation and preventing leukocyte activation, trapping, and migration34,35.
Other flavonoids that ranked high for increasing the lifespan of C. elegans were rutin and hesperidin with a predictive probability of 0.95 and 0.94, respectively. Rutin (or quercetin-3-rutinoside), is a flavonol glycoside that is abundant in many plants such as passionflower, apple, tea, buckwheat seeds and citrus fruits36,37. It possesses a range of biological properties including antioxidant, anticancer, neuroprotective, cardio-protective and skin-regenerative activities36,37. Rutin had a high structural similarity to other flavonoids in the DrugAge database and particularly with quercetin 3-O-β-d-glucopyranoside-(4→1)-β-d-glucopyranoside (Q3M). The Tanimoto coefficient between the RDKit fingerprints of Q3M and rutin was 0.99. The similarity map between the two compounds is shown in Figure 5.
Q3M is a flavonoid abundant in onion peel that was found to extend the lifespan of C. elegans39. In the same study, even although rutin was found to improve the tolerance of C. elegans to oxidative stress, which is desirable for longevity, rutin had no affect the worm's lifespan39. Davalli et al. (2016) also reported that rutin did not improve the longevity of C. elegans 40. On the other hand, Chattopadhyay et al. (2017) showed the rutin promoted longevity in a species of fly, Drosophila melanogaster (D. melanogaster)37.
Hesperidin has shown reactive oxygen species (ROS) inhibition and anti-ageing effects in the yeast species Saccharomyces cerevisiae41. Fernández-Bedmar et al. (2011) found that hesperidin extracted from orange juice had a positive influence on the lifespan of D. melanogaster 42. Wang et al. (2020) showed that orange extracts, where hesperidin was the predominant phenolic compound, increased the mean lifespan of C. elegans 43. In the same study, orange extracts were also found to promote longevity by enhancing motility and reducing the accumulation of age pigment and ROS levels43.
Soy isoflavones include genistein, glycitein, and daidzein. Genistein, a compound of the DrugAge, has been found to prolong the lifespan of C. elegans and increase its tolerance to oxidative stress 44. Gutierrez-Zepeda et al. (2005) found that C. elegans fed with soy isoflavone glycitein had an improved resistance towards oxidative stress45. However, in comparison to control worms, the lifespan of C. elegans fed with glycitein was not significantly affected45. The effect of daidzein on the lifespan of C. elegans in the presence of pathogenic bacteria was investigated by Fischer et al. (2012)46. The study found that daidzein had an estrogenic effect that which extended the worm’s lifespan in presence of pathogenic bacteria and heat46. Herein, we applied the MD random forest model to predict the effect of 6''-O-malonyldaidzin on the lifespan of C. elegans. 6''-O-Malonyldaidzin is an o-glycoside derivative of daidzein found in food products such as soybean, miso, soy milk and soy yoghurt47. Its predicted probability for extending the lifespan of the worm was 0.84.
Fatty acids and conjugates
Lipid metabolism has an essential role in many biological processes of an organism. Lipids are used as energy storage in the form of triglycerides and can therefore aid survival under severe conditions48. Additionally, lipids have a key role in intercellular and intracellular signalling as well as organelle homeostasis49. Research on both invertebrates and mammals suggest that alteration in lipid levels and composition are associated with ageing and longevity48,49.
A recent review by Johnson and Stolzing (2019), on lipid metabolism and its role in ageing summarised key lipid-related interventions that promote longevity in C. elegans50. Some of the studies presented in the review are reported here. In response to fasting O’Rourke et al. (2013), showed that supplementing C. elegans with the ω -6 polyunsaturated fatty acids (PUFAs) arachidonic acid and di‐homo‐γ‐linoleic increased the worm’s starvation resistance and prolonged its lifespan by stimulating autophagy51. Similarly, Qi et al. (2017), found that treating C. elegans with ω-3 PUFA -linolenic acid in dose‐dependent manner extended the worm’s lifespan52. The study indicated that the ω-3 fatty acid underwent oxidation to generate a group of molecules known as oxylipins. The findings suggested that the increase the worm’s lifespan could be a result of the combined effects of the α-linolenic acid and oxylipin metabolites52. Sugawara et al. (2013) found that a low dose of fish oils, which contained PUFAs eicosapentaenoic acid and docosahexaenoic acid, significantly increased the lifespan of C. elegans53. The authors proposed that a low dose of fish oils induces moderate oxidative stress that extended the lifespan of the organism. In contrast, large amounts of fish oils had a diminishing effect on the worm’s lifespan53.
Gamolenic acid or γ–linolenic acid (GLA) was the second top-hit molecule of the screening database with a predictive probability of 0.95. GLA is an ω-6 PUFA, composed of an 18-carbon chain with three double bonds in the 6th, 9th and 12th position54. Rich sources of GLA include evening primrose oil (EPO), black currant oil, and borage oil55. In mammals, GLA is synthesized from linoleic acid (dietary) via the action of the enzyme σ -6 desaturase54,55. GLA is a precursor for other essential fatty acids such as arachidonic acid54,55. Conditions such as hypertension and diabetes as well as stress and various aspects of ageing, reduce the capacity of -6 desaturase to convert linoleic acid to GLA56. This may lead to a deficiency of long-chain fatty acid derivatives and metabolites of GLA. GLA has been used as a constituent of anti-ageing supplements and has shown to possess various therapeutic effects in humans including improvement of age-related anomalies54.
Sodium aurothiomalate, with a lifespan increase probability of 0.82, is a thia short-chain fatty acid used for the treatment of rheumatoid arthritis and has potential antineoplastic activities29,57. In preclinical models, sodium aurothiomalate inhibited protein kinase C iota (PKCι) signalling, which is overexpressed in non-small cell lung, ovarian and pancreatic cancers57.
Organooxygen compounds
Lactose, with a lifespan increase probability of 0.89, is a disaccharide found in milk and other dairy product. In the human intestine, lactose is hydrolysed to glucose and galactose by the enzyme lactase. Out of the compounds in the DrugAge database, lactose had the highest structural similarity with trehalose. Trehalose has been found to increase the mean lifespan of C. elegans by over 30%, without showing any side effects58. The Tanimoto coefficient between the RDKit fingerprint representations of trehalose and lactose was 0.85. Even though lactose has a high (Tanimoto) similarity to trehalose, Xing et al. (2019) found that lactose treatment shortened the lifespan of C. elegans59.
Sucrose, with a lifespan increase probability of 0.83, is a disaccharide composed of glucose and fructose60. It is used as the main form of transporting carbohydrates in fruits and vegetables60. Other sugars such as trehalose, galactose and fructose have been found to extend the lifespan of C. elegans58,61,62. However, Zheng et al. (2017) found the treating C. elegans with sucrose had no significant effect on the organism’s mean lifespan62. In rats, sucrose has been found to shorten the mean lifespan and elevate the blood pressure63. Rovenko et al. (2015) showed that in D. melanogaster, high sucrose consumption decelerated pupation, increased pupa mortality and promoted obesity64.
Lactulose, with a lifespan increase probability of 0.83, is a synthetic disaccharide composed of monosaccharides lactose and galactose64. Lactulose has been to be an effective treatment for chronic constipation in elderly patients as well as improve the cognitive function in patients with hepatic encephalopathy64,65.
Other classes of compounds
Other compounds with a predictive probability ≥ 0.80 for increasing the lifespan of C. elegans included aloin, a constituent of aloe vera with a predictive probability of 0.81, as well as the antibiotics fidaxomicin (predictive probability = 0.84), rifapentine (predictive probability = 0.81) and chlortetracycline (predictive probability = 0.80).
Rifapentine is a macrolactam antibiotic approved for the treatment of tuberculosis66. Macrolactams are a small class of compounds which consist of cyclic amides having unsaturation or heteroatoms replacing one or more carbon atoms in the ring29. Other macrolactams such as rifampicin and rifamycin have been found to increase the lifespan of C. elegans67.
Golegaonkar et al. (2015) showed that rifampicin reduced AGE products and extended the mean lifespan of C. elegans by 60%67. Advanced glycation end (AGE) products are formed from the non-enzymatic reaction of sugars, such as glucose, with proteins, lipids or nucleic acids67. AGE products have been implicated in ageing and age-related diseases such as diabetes, atherosclerosis, and neurodegenerative67. The effect of two other macrolactams, rifamycin SV and rifaximin, on the worm’s lifespan was also investigated. Rifamycin SV was found to exhibit similar activity to rifampicin, while rifaximin lacked anti-glycating activity and did not extend the lifespan of C. elegans. The authors suggested that the anti-glycation properties of rifampicin and rifamycin could be attributed to the presence of a para-dihydroxyl moiety, which was not present in rifaximin67. As shown in Figure 6, this functional group is also present in rifapentine. Experimental testing would be required to investigate whether rifapentine possess similar properties to rifampicin and rifamycin.
Evaluation of the chemical similarity principle
Several of the compounds identified by the random forest model had already been experimentally evaluated for increasing the lifespan of C. elegans and other model organisms. In particular, the RDKit fingerprints of rutin are 0.99 (Tanimoto) similar to that of Q3M, an active compound. However, experimental studies found that although it is structurally similar to active compounds, rutin does not extend the lifespan of C. elegans39,40. Additionally, the Tanimoto coefficient between the RDKit fingerprint representations of lactose and trehalose, an active compound, is 0.85. Nevertheless, in vivo studies showed that treatment with lactose reduced the lifespan of C. elegans59. In these cases, the chemical similarity principle, which states that chemically similar compounds tend to have similar bioactivities, appears to fail. An explanation presented by Martin et al. (2002) is that protein structures are complex and flexible systems68. Thus, structurally similar chemicals may bind in different orientations to the active site, interact with a different conformation of the protein or even bind to completely different proteins68.