Visualizing MACCSKeys similarity spectrum of chemical molecules
The 2828 food compounds in the training set are used as templates to generate 2828-dimensional vectors termed as MACCSKeys similarity spectrums, depicting MACCSKeys fingerprint similarities of the molecules of interest with all the templates. The MACCSKeys similarity spectrums of three chemical molecules are illustrated in Fig. 1A ~ C. The first molecule FDB001388 (25-Methyl-24-methylenecholesterol), curated in FoodDB (https://foodb.ca/), is a food constituent of seeds of Indian mustard (Brassica juncea), sunflower (Helianthus annuus) and French beans (Phaseolus vulgaris). The second molecule Tithofolinolide, with identifier NPACT00978 in NPACT [43], is a plant natural product that belongs to terpenoids class with the therapeutic effects of treating colon, haematopoietic and lymphoid tissue cancers. The third molecule Vidarabine, with identifier CHEMBL1090 in ChEMBL [49], is a virus inhibitor that targets human herpesvirus 1 DNA polymerase to cure eye infections. As shown in Fig. 1, the number and amplitudes of spectra peaks decrease in the order of food compounds, natural products and drugs. Natural product Tithofolinolide shows many medium spectra peaks ranging from 0.4 to 0.8 (see Fig. 1B) and drug Vidarabine also shows several high peaks above 0.8 (see Fig. 1C). These spectrums to some degree indicate knowledge transferability between food compounds, natural products and drug molecules.
Performance of 5-fold cross validation
The results of stratified 5-fold cross validation are illustrated in Fig. 2A ~ G. For all the binary models, the positive classes account for very low percentages of the entire training data, ranging from the highest 5.27% to the lowest 0.39% (see Fig. 2A). The numberings of health effects are provided in Supplementary file S1. Nevertheless, most of the RUSBoost learners achieve encouraging performance in terms of accuracy (see Fig. 2B) and AUC scores (see Fig. 2C). The accuracies range from 68.33–94.87% with 60% learners achieving more than 80% accuracy. The ROC AUC scores range from 0.5991 to 0.9905 with 57% learners achieving more than 0.8 AUC score. In Fig. 3A, the ROC curves for 14 health effects (e.g., antibacterial, anti-inflammatory, anticancer) are illustrated as examples. In the scenario of extreme class imbalance, these two metrics still report optimistic performance when the models deterioratively or corruptively behave, for instance, all the minority classes are misclassified to the majority classes. For the reason, the recall rates on both classes are more important to know how much biased the models are. It is shown in Fig. 2D that the RUSBoost learners achieve satisfactory recall rates on the minority class (positive class) with 76.24% models achieving recall above 0.6 and 31.68% models achieving recall above 0.8. On the majority class (negative class), all the models unsurprisingly achieve very high recall rates with 100% models achieving recall above 0.6 and 58.42% models achieving recall above 0.8. Recall metrics show that the RUSBoost learners still show a certain degree of bias, but the bias is much alleviated.
In the scenario of imbalanced-class learning, the misclassification of majority class heavily impacts the precision of the minority class, but slightly vice versa. As shown in Fig. 2E, the precisions on the minority class are very low with 22.77% models achieving precision above 0.05. The results are not surprising, because only a small fraction of misclassified majority class (i.e., false positive) will greatly recue the ratio of true positive to the whole predicted positives. Comparatively, the precisions on the majority classes are very high with 100% models achieving precision above 0.9 (see Fig. 2F). These precisions are similarly distributed as the work [52], which uses deep learning to predict drugs targeting specific enzymes or genes with precisions all lower than 0.2 and many precisions ranging from 0.01 to 0.06. In this framework, 77.23% models achieve precision above 0.015. Similarly, deep learning for imbalanced drug discovery [52] mostly achieves MCC scores ranging from 0.05 to 0.3. In this framework, 87.13% models achieve MCC scores above 0.05 (see Fig. 2G), though the training data is much smaller as compared to 246,591 ~ 304,987 training examples fed to deep learning framework [52].
Performance of independent test
We cannot find other sources of food compounds whose health effects are known, so natural products and drug molecules are used instead as independent test data. The bioactivities of these data are usually not consistent with the health effects adopted in this framework, for instance, anticancer in NPACT [43] is categorized according to tissues, potentially corresponding to 8 health effects in FooDB, e.g., cancer-preventive, antileukemic, anticancer, antitumor, antineoplastic, antiproliferant, antimetastatic and anticarcinogenic. For the sake of semantic difference, any food compound from NPACT [43] is deemed to be correctly recognized if predicted to be positive by any one model of these 8 health effects. For other independent test data, we adopt similar strategy. The recalls on plant natural products from NPACT [43] range from 0.8406 to 0.9040 (see Fig. 3B) and the recalls on drug molecules from ChEMBL [49] range from 0.7895 to 0.9690. These results show that the proposed framework generalizes well to unseen chemical molecules and meanwhile the knowledge is transferable between food compounds, natural products and drug molecules.
Repurposing natural products and drugs with supporting evidences from recent literature
As demonstrated in independent test, the models trained on food compounds are useful for predicting novel activities of natural products and drugs. Based on this point, we use the proposed framework to find new purposes of the natural products in NPACT [43] and drugs in ChEMBL [49]. As shown in Fig. 4, most natural product and drug molecules exhibit multiple bioactivities. For clarity, only the health effects with more than 30% molecules are illustrated. The entire predictions are provided in Supplementary file S2 (NPACT natural products) and Supplementary file S3 (ChEMBL drugs). The anticancer natural products exhibit antiedemic (68.61%), anti-inflammatory (57.72%) and antiviral (46.53% anti-HIV) health effects (see Fig. 4A). Meanwhile, 32.87% molecules also exhibit antibacterial effects (see Fig. 4B). Among the antiviral drug molecules, most molecules exhibit anticancer effects, e.g., anti-carcinomic (94.74%) and anticancer (68.42%) (see Fig. 4C). In addition, 42.11% drugs also exhibit cytotoxic adverse effects (see Fig. 4D). These results to some degree reveal the associations between carcinogenesis, viruses and inflammation as well as drug repurposing. The predicted effects for us to turn these molecules for other therapeutic purposes. Independent test demonstrates to what percentage of labels are correctly recognized. As for novel health effects, we resort to recent literature for supporting evidences. In this section, we only conduct dozens of case studies via manual literature validation. Automatic literature search via Recurrent Neural Network (RNN) and natural language processing (NLP) promise to enlarge the coverage of validation, which could be treated as an independent research topic.
Case studies on anticancer natural product Nobiletin
Nobiletin, a plant metabolite or natural product found in Citrus tankan, is a methoxyflavone used as an antineoplastic agent. When applying Nobiletin to the 101 binary models, we obtain other 22 health effects besides anticariogenic (see Supplementary file S2 for details), e.g., anti-inflammatory, antioxidant, anti-alzheimeran, anti-hepatotoxic, anti-HIV, anti-histaminic, choleretic, etc. To validate these novel effects, we manually retrieve publications via the keywords Nobiletin and specific effects. The retrieved evidences and corresponding article PubMed identifiers are provided in Table 1. As shown in Table 1, two articles (PMID: 25191498, PMID: 31295812) support that nobiletin takes anti-dementia activities to improve the symptoms of neurodegenerative diseases such as Alzheimer’s disease and Parkinson’s disease. Furthermore, the article (PMID: 26874072) demonstrates that Nobiletin takes antioxidant, anti-inflammatory and anti-alzheimeran health effects, which are all predicted by the proposed framework. We also find supporting evidences to show that nobiletin takes many other effects such as antiulcer, anticataract, hepatoprotective, vasodilator, VEGF-Inhibitor and antihypertensive. These evidences convincingly demonstrates that the proposed framework trained on food compounds generalizes well to natural products.
Table 1
Novel health-effect implications of anticancer natural product Nobiletin (NPACT00817) from NPACT database.
Predicted health effects | Supporting evidences | References |
Antialzheimeran | Nobiletin is shown to improve cognitive deficits and pathological features of neurodegenerative diseases such as Alzheimer’s disease and Parkinson’s disease. | 1. Anti-dementia Activity of Nobiletin, a Citrus Flavonoid: A Review of Animal Studies (2014) [PMID: 25191498] 2. Potential Benefits of Nobiletin, A Citrus Flavonoid, against Alzheimer's Disease and Parkinson's Disease (2022) [PMID: 31295812] |
Antioxidant, Antiinflammatory | Nobiletin is reported to have anti-oxidant and anti-inflammatory effects in treating asthma, colitis and Alzheimer's disease. | 1. Nobiletin promotes antioxidant and anti-inflammatory responses and elicits protection against ischemic stroke in vivo (2016) [PMID: 26874072] |
Vasodilator | Nobiletin is shown to have cardiovascular protection effects (e.g., prevention of cardiac hypertrophy and platelet aggregation) and relaxation of endothelium-denuded rat aortic smooth muscles. | 1. Nobiletin, a citrus flavonoid, activates Vasodilator-stimulated phosphoprotein in human platelets through non-cyclic nucleotide-related mechanisms (2017) [PMID: 27959381] 2. Endothelium-independent vasodilator effects of nobiletin in rat aorta (2019) [PMID: 31088764] |
Antiulcer | Nobiletin is shown to have anti-ulcerogenic activity that significantly attenuates the ethanol-induced gastric ulcer in mice. | 1. The gastroprotective effect of nobiletin against ethanol-induced acute gastric lesions in mice: impact on oxidative stress and inflammation (2017) [PMID: 28948855] |
Anticataract | Nobiletin is verified to prevent formation of cataract via inhibiting the proliferation of human lens epithelial cells. | 1. Polymethoxyflavones as agents that prevent formation of cataract: nobiletin congeners show potent growth inhibitory effects in human lens epithelial cells (2013) [PMID: 23199882] |
Hepatoprotective | It is shown that hepatic proinflammatory TNF-α mRNA expression and liver damage indicators are significantly lower in Nobiletin-supplemented mice; and amorphous solid dispersion of Nobiletin improves Nobiletin’s hepatoprotective properties. | 1. Long-term dietary supplementation with low-dose nobiletin ameliorates hepatic steatosis, insulin resistance, and inflammation without altering fat mass in diet-induced obesity (2017) [PMID: 28116779] 2. Physicochemical and biopharmaceutical characterization of amorphous solid dispersion of nobiletin, a citrus polymethoxylated flavone, with improved hepatoprotective effects (2013) [PMID: 23707470] |
VEGF-Inhibitor | Nobiletin inhibits the secretion of key angiogenesis mediators and vascular epithelial growth factor (VEGF). | 1. The flavonoid nobiletin inhibits tumor growth and angiogenesis of ovarian cancers via the Akt pathway (2015) [PMID: 25845666] |
Antihypertensive | Nobiletin reduces high levels of blood pressure, circulating angiotensin II and angiotensin‑converting enzyme activity. | 1. Nobiletin resolves left ventricular and renal changes in 2K-1C hypertensive rats (2022) [PMID: 35662276] 2. Nobiletin alleviates vascular alterations through modulation of Nrf-2/HO-1 and MMP pathways in l-NAME induced hypertensive rats (2019) [PMID: 30864566] |
Case studies on antibacterial implications of anticancer natural products
We further conduct case studies to show that the anticancer natural products could be properly repurposed for antibacterial therapies, which is an alternative solution combat bacterial drug resistance. In Table 2, supporting evidences with PubMed identifiers are provided for 8 natural product compounds. As shown in Table 2, sapintoxins A takes strong effect against Mycobacterium tuberculosis H37Ra (PMID: 24877849). Xanthochymol and Stigmasterol exhibit both antibacterial and antifungual effects. Rapanone is effective against Staphylococcus aureus strains of bacteria that has developed methicillin resistance. These results show that natural chemicals from plants and microbes are natural reservoirs for antimicrobial drugs development and the proposed framework could be used as a tool for discovery of antibacterial lead compounds.
Table 2
Antibacterial implications of anticancer natural products in NPACT database.
Natural products | Supporting evidences | References |
Menthol | Menthol oils strongly inhibit plant pathogenic microorganisms, whereas human pathogens are only moderately inhibited. | 1. Antimicrobial screening of Mentha piperita essential oils (2003) [PMID: 12083863] |
Glycyrrhizic acid ammonium salt | Studies demonstrate that glycyrrhizic acid ammonium salt via nanoparticles suppress phytoplasma infection. | 1. Nanoinhibitory Impacts of Salicylic Acid, Glycyrrhizic Acid Ammonium Salt, and Boric Acid Nanoparticles against Phytoplasma Associated with Faba Bean (2022) [PMID: 35268567] |
Sapintoxin A | Sapintoxins A and C exhibit strong activities against Mycobacterium tuberculosis H37Ra. | 1. A review of the medicinal uses, phytochemistry and pharmacology of the genus Sapium (2014) [PMID: 24877849] |
β-Sitosterol | β-sitosterol identified in Parthenium hysterophorus is verified via in-vitro bioassay and analytical chemistry method to have antibacterial activities against aquatic bacterial pathogens. | 1. β-Sitosterol: An Antibacterial Agent in Aquaculture Management of Vibrio Infections (2020) [https://doi.org/10.22207/JPAM.14.4.48] |
Xanthochymol | Xanthochymol isoforms isolated from the genus Garcinia have antibacterial and anti-fungal activities. | 1. Garcinia L.: a gold mine of future therapeutics (2021) [https://link.springer.com/article/10.1007/s10722-020-01057-5]. |
Stigmasterol | Stigmasterol is a potent and broad-spectrum antibacterial and antifungal agent as a lead compound of antimicrobial drugs. | 1. Antimicrobial activity of stigmasterol from the stem bark of Neocarya macrophylla (2018) [https://doi.org/10.4102/jomped.v2i1.38] |
Rapanone | Rapanone is shown to be active against methicillin resistant Staphylococcus aureus (MRSA) strains of bacteria. | 1. Antibacterial activities and structure-activity relationships of a panel of 48 compounds from Kenyan plants against multidrug resistant phenotypes (2016). [PMID: 27386347] |
Geniposide | Geniposide exhibits antibacterial, anti-inflammatory therapeutic efficacies. | 1. Geniposide reduces Staphylococcus aureus internalization into bovine mammary epithelial cells by inhibiting NF-κB activation (2018) [PMID: 30321590] |
Case studies on anticancer and antibacterial repurposing implications of antiviral drugs
Comparatively, drugs less resemble food compounds than natural products in terms of origin and scaffold diversity, but independent test shows that the proposed framework trained on food compounds also generalizes well to drugs. Actually, many synthetic drug molecules are modified from natural products, thus reducing the structural gaps between drugs and food compounds. The antiviral drugs from ChEMBL [49] are chosen as case studies to validate the reliability of novel predictions by the proposed framework. The entire predictions are provided in Supplementary file S3. Twelve antiviral drugs are found to have anticancer and/or antibacterial activities via manual literature retrieval (see Table 3). As shown in Table 3, most antiviral drugs exhibit anticancer effect. For instance, oseltamivir demonstrates potential treatment of liver cancer (PMID: 34859259), adefovir dipivoxil inhibits tumorigenesis of colon cancer cells (PMID: 30872078) and Maraviroc demonstrates therapeutic potential for colorectal cancer liver metastasis (PMID: 32902795). Didanosine phosphoramidates exhibit both anticancer and antibacterial effects. The antiviral drug demonstrates potential effects against bacteria S. aureus, which is sensitive to and resistant against methicillin (PMID: 34668851). These evidences suggest that the proposed framework cold help us find the antiviral drugs that are potentially repurposed for anticancer therapies.
Table 3
Anticancer and antibacterial repurposing implications of antiviral drugs in ChEMBL database.
Predicted effects of drugs | Supporting evidences | References |
Efavirenz (antibacterial) | Efavirenz significantly inhibits biofilm formation of both methicillin-sensitive S. aureus and methicillin-resistant S. aureus; Efavirenz inhibits the growth of B. subtilis. | 1. The antiviral drug efavirenz reduces biofilm formation and hemolysis by Staphylococcus aureus (2021) [PMID: 34668851] 2. Antibacterial effects of antiretrovirals, potential implications for microbiome studies in HIV (2018) [PMID: 28497768] |
Oseltamivir (anticancer) | It has been reported that Oseltamivir induces death of liver cancer cells. | 1. Potential of antiviral drug oseltamivir for the treatment of liver cancer (2021) [PMID: 34859259] |
Didanosine (antibacterial, anticancer) | Didanosine phosphoramidates exhibit marked activities against Gram negative and positive bacteria; Didanosine represents a widely prescribed class of antiviral and anticancer drugs. | 1. Didanosine phosphoramidates: synthesis, docking to viral NA, antibacterial and antiviral activity (2014) [10.1007/s00044-014-1073-2] 2. Bioretrosynthetic construction of a didanosine biosynthetic pathway (2014) [PMID: 24657930] |
Adefovir dipivoxil (anticancer) | Adefovir dipivoxil inhibits tumorigenesis of colon cancer cells. | 1. Adefovir dipivoxil sensitizes colon cancer cells to vemurafenib by disrupting the KCTD12-CDK1 interaction (2019) [PMID: 30872078] |
Ritonavir (anticancer) | Ritonavir been shown to have anticancer or synergistic anticancer activities in several cancers. | 1. Overcoming cancer therapeutic bottleneck by drug repurposing (2020) [PMID: 32616710] 2. HIV-1 protease inhibitor, ritonavir: a potent inhibitor of CYP3A4, enhanced the anticancer effects of docetaxel in androgen-independent prostate cancer cells in vitro and in vivo (2004) [PMID: 15492266] |
Darunavir (anticancer) | Darunavir, as a synthetic nonpeptidic protease inhibitor, has been verified to have anticancer activities. | 1. Molecular designing and in silico evaluation of darunavir derivatives as anticancer agents (2014) [PMID: 24966524] |
Ribavirin (anticancer) | Ribavirin is an effective anticancer agent, but its efficacy seems to vary with cancers or tissues. | 1. Ribavirin as an anti-cancer therapy: acute myeloid leukemia and beyond? (2010) [PMID: 20629523] |
Saquinavir (anticancer) | Saquinavir has exhibited effects in several types of cancer, e.g., Kaposi carcinoma, neuroblastoma. | 1. Saquinavir: From HIV to COVID-19 and Cancer Treatment (2022) [PMID: 35883499] |
Nelfinavir (anticancer) | Nelfinavir takes anti-cancer effects via modulating different cellular conditions, e.g., cell cycle, tumor microenvironment. | 1. The Anti-Cancer Properties of the HIV Protease Inhibitor Nelfinavir (2020) [PMID: 33228205] |
Maraviroc (anticancer) | Maraviroc significantly inhibits CRC liver metastasis in animal model. | 1. Antineoplastic effects of targeting CCR5 and its therapeutic potential for colorectal cancer liver metastasis (2021) [PMID: 32902795] |
Abacavir (anticancer) | Abacavir induces senescence and cell death in prostate cancer cells. | 1. The reverse transcription inhibitor abacavir shows anticancer activity in prostate cancer cell lines (2010) [PMID: 21151977] |
Cidofovir (anticancer) | Cidofovir takes antineoplastic activities against HCMV-infected glioblastoma cells, or even induces death of glioblastoma cells free of HCMV infection. | 1. Cidofovir: a novel antitumor agent for glioblastoma (2013) [PMID: 24170543] |
Unravelling beneficial and risky health effects of food flavors
Food flavors are widely used to enhance food acceptance and popularity, and thus health safety of flavors becomes the first concern to be addressed. From FooDB, we obtain 1476 flavor compounds. We use the proposed framework to unravel the beneficial and risky effects of these compounds. The entire predictions are provided in Supplementary file S4. From the predicted effects, we choose antibacterial and antiviral effects as beneficial and choose cytotoxic and mutagenic effects as risky. It is noted that the proposed framework may classify a compound as mutagenic and antimutagenic contradictorily, e.g., phenethyl phenylacetate (FDB013613). In this case, the compound is deemed to be mutagenesis associated without determining its beneficial or adverse property. The contradictory predictions result from the independency of binary models, none of which is deliberately designed to discriminate mutagenic effect from antimutagenic effect. The interpretation is applicable to other pairs of opposite effects.
Case study on antibacterial and antiviral curcumenol
Computational results show that many food flavor molecules exhibit both antibacterial and antiviral activities. Ten compounds are illustrated as examples in Fig. 5A, out of which curcumenol is chosen for case study. To estimate the confidence level of the two predicted health effects, we calculate MACCSKeys fingerprint similarities between curcumenol and the compounds in FooDB, which have been annotated with antibacterial and antiviral health effects. Eight compounds with top high similarities (> 0.7) are illustrated in Fig. 5B, in which (2) ~ (5) are antiviral compounds and (6) ~ (9) are antibacterial compounds. Common functional subgroups to a certain degree could interpret that these molecules together with curcumenol exhibit antibacterial and antiviral activities. More convincing evidences still need resort to lab experiments. Assays via serial dilution method demonstrate that curcumenol exhibits significant antibacterial activities against Pseudomonas aeruginosa and Streptococcus pyogenes [53]. Meanwhile, curcumenol together with the other ingredients of C. kwangsiensis like curcumin, curcumol and curdione exhibit antiviral activities [54].
Case study on cytotoxic and mutagenic 3-Methyl-1-phenyl-3-pentanol
Food flavorings could beneficially promote health but also potentially do damages to human and animals. In Fig. 6A, ten compounds potentially inducing cytotoxicity and genotoxicity are illustrated. Take 3-Methyl-1-phenyl-3-pentanol for example, six structurally similar compounds with (2) ~ (4) annotated with cytotoxic effects and (5) ~ (7) annotated with mutagenic effect are illustrated in Fig. 6B. The MACCSKeys fingerprint similarities are all above 0.5, and these molecules resemble in terms of scaffolds and functional subgroups. As surveyed in [55], acute toxicities of 3-Methyl-1-phenyl-3-pentanol are observed in animals but human subjects show no irritation reaction. As far, no publications report 3-Methyl-1-phenyl-3-pentanol induced human cell mutagenicity.