For each of the activity scoring functions developed in house, we computed evaluation metrics [70], including the F1 scores, overall accuracy, and a normalized confusion matrix/contingency table. We then examined the chemical heredity and similarity of the compounds relevant to each endpoint through a taxonomy diagram. Each taxonomy diagram was not intended to be exhaustive but to represent the major classes of compounds and their chemical lineage.
For all the studies, we found that the evaluation metrics associated with the testing data set were quite similar to those found using the data resulting from multi-fold cross-validation, which suggests robust values for the metrics and a lack of bias across samples.
Finally, although we generated numerous novel structures for each endpoint, we display only a small set of randomly selected compounds for each case as illustrative examples. The SMILES representations for additional molecules are given in the Supplementary Material (Tables S1, S3-S5).
3.1 Toxicity to honeybees
Activity scoring function (honeybee pesticide classifier)
To evaluate the performance of this classification model, we evaluated its ability to classify chemicals from the testing subset data into the categories noted earlier. We found that the model generated results having F1 scores of 0.9, 0.78, and 0.95 for highly toxic, moderately toxic, and nontoxic compounds, respectively. We computed the overall accuracy score for the model as 88%, which is comparable to the cited accuracy of 91% for BeeToxAI [71]. Further, calculations showed that the classifier had high sensitivity (98%, 99%, and 65%), which indicates a low probability of false negatives, and high specificity (92.5%, 84%, and 99%), which suggests a low probability of false positives across all three classes. Details of the classification performance are given in Figure S1 of the Supplementary Material in the form of a confusion matrix.
Taxonomy
Although molecules with documented toxicity to honeybees are diverse, a significant fraction of the highly toxic molecules belong to the chemical classes pyrethroids, benzene and substituted derivatives, pyridines, and organonitrogen compounds. Specifics of the chemical classes and taxonomy for members of the honeybee toxicant data set are shown in Figure S2.
Novel compounds
For this case study, 30 pyrethroids and benzenoids were used to train the structure generator. Examples of output from the framework in this case are shown in Fig. 2, where labels indicate the taxonomic classes.
3.2 Immunotoxicity
Activity scoring function (immunotoxicity classifier)
Using the testing subset data for immunotoxicants, we quantified the classification performance of the activity scoring function using the metrics noted earlier. Among the values for the performance metrics were a sensitivity of 68.7%, specificity of 83.8%, and an overall accuracy score of 76%, which compare well to those of the immunotoxicity prediction model developed and implemented by ProTox-II [72], for which the authors reported a 69.5% sensitivity, 79.5% specificity, and an overall accuracy of 75%. The computed F1 scores for our model were 0.70 and 0.73 for toxic and nontoxic classes, respectively. Details of the classification performance, as represented by a confusion matrix, are given in Figure S3.
To further assess the performance of the classifier, we tested its ability to correctly identify chemicals with well documented immunotoxicity. The model showed a good ability (82.6% accuracy) to recognize the immunotoxic potential of members from chemical classes of concern (e.g., pyrethroids and per- and poly-fluoroalkyl substances). See Table S2 of the Supplementary Material for more information.
Taxonomy
Chemical compounds with potential immunotoxicity are structurally diverse. A taxonomy analysis of the full data set of immunotoxic molecules revealed that many of the compounds with documented or suspected immunotoxicity belonged to the chemical classes diphenylmethanes, alkyl fluorides, azoles, stilbenes, and carbonyl compounds. The full taxonomy is depicted in Figure S4.
Novel compounds
In this analysis, 20 immunotoxicants spanning several chemical classes were used to train the structure generator. A sample set of generated molecules is displayed in Fig. 3.
3.3 Endocrine disruption
Activity scoring function (ER- antagonists classifier)
We evaluated the accuracy of this model by examining its ability to classify chemicals from the testing subset for endocrine disruptors into the categories noted previously. We computed evaluation metrics and found that the classifier produced F1 scores of 0.94, 0.76, and 0.88 for the classes highly potent, moderately potent, and non-potent, respectively. We further found that the model had an overall accuracy of 89%, which was comparable to the value of 91% for the two-category estrogen receptor prediction model developed and implemented in ProTox-II. Further, the model showed high sensitivity (77.8%, 90.6% and 89.4%) and high specificity (98.1%, 94.5%, and 86.1%) for highly potent, moderately potent, and non-potent classes, respectively. Figure S5 depicts the details underlying these metrics.
Taxonomy
Ligands with ER- antagonistic properties belong to a broad range of chemical classes, including stilbenes, indoles, naphthalenes, and hydroxyflavonoids (see Figure S6).
Novel compounds
In the case study of potential endocrine disruptors, we used 20 randomly selected structures from the training subset as the input to IProCH. Figure 4 shows samples of the generated structures and their chemicals classes.
3.4 Mutagenicity
Activity scoring function (mutagenicity classifier)
As noted earlier, we used a previously published model as the basis for for this test case. Because information was already available regarding the model’s accuracy and performance [73], we did not perform our own assessment.
Taxonomy
An analysis of over 800 chemicals reveals that molecules with documented or potential mutagenicity belong to a broad range of chemical classes and span both the organic and inorganic chemical kingdoms. This diversity is illustrated in Figure S7.
Novel compounds
We used a training subset of 25 structures spanning various classes of the taxonomy as input to the structure generator. A sample of the generated novel and synthesizable chemicals with mutagenicity potential are shown in Fig. 5.