How to Predict Future Upcoming Harmful Agents: Array Type Meta-Predictor Based Classification

doi:10.21203/rs.3.rs-1592493/v1

Molecular insights into chemical safety are very important for sustainable development as well as risk assessment. This study considers how to manage future upcoming harmful agents, especially potentially cholinergic chemical warfare agents (CWAs). For this purpose, structures of known cholinergic agents were encoded by molecular descriptors. And then each drug target interaction (DTI) was learned from the encoded structures and their cholinergic activities to build DTI classification models for five cholinergic targets with reliable statistical validation (ensemble-AUC: up to 0.790, MCC: up to 0.991, accuracy: up to 0.995). The collected classifiers were transformed to 2D or 3D array type meta-predictors for multi-tasking: (1) cholinergic prediction and (2) CWA detection. The detection ability of the array classifiers was verified under the imbalanced dataset between CWAs and none CWAs (area under the precision-recall curve: up to 0.997, MCC: up to 0.638, F1-sore of none CWAs: up to 0.991, F1-sore of CWAs: up to 0.585).

chemical safety

risk management

risk assessment

chemical warfare agents

new psychoactive substances

molecular descriptor

machine learning

Chemical warfare agents (CWAs) and hazardous chemicals threaten chemical safety.[1-2] Prior to the chemical weapons convention, CWAs were intentionally invented and synthesized for military operations. Nowadays there are concerns about unintentional CWA inventions along with their unexpected accidents through (1) synthetic chemistry related to known CWAs (eg. organophosphorus derivatives) [2-3] or (2) chemistries for therapeutic drugs (eg. BZ assigned code by NATO) and illegal drugs.[4] Serial terrors such as Sarin in Japan in 1994, VX in Malaysia in 2017, and Novichok (non-declared agent) in Syria in 2018, make the concerns on chemical weapons feasible fears.[5] Moreover, some harmful chemicals (as shown in Figure 1) were not registered in the CWA list of organization for the prohibition of chemical weapons (OPCW) but have resulted in devasting causalities, and the tragedies are still going: (1) ethoxyethyl guanidinium (PGH)/Polyhexamethylene guanidine (PHMG), ingredients of Reckitt Benckiser sterilizers, which resulted in disinfectant deaths of babies and pregnant women in South Korea,[6-7] and (2) TCDD, a trace impurity of Agent Orange (herbicide and defoliant chemical) during the Vietnam War, which has promoted epigenetic transgenerational inheritance of diseases.[8-9]

For chemical safety, humans have built regulations or systems to control the risk resulting from harmful chemicals.[10-12] With such systems, the detection of hazardous agents or their detoxification technologies have been continuously developed.[13-16] Despite the history, the upcoming rate of harmful agents is more rapid than the rate to make a regulation or a detection technology. For example, more than 450 new psychoactive substances (NPSs) or designer drugs which were designed to mimic the pharmacological effects of known illegal drug and could avoid a regulation of illegal drugs and/or detection in standard drug tests, have been monitored from 2014 to 2017.[17-19] During these periods, any system for safety could not suitably and timely control the NPSs: their identification and detection, evaluation of their toxicity, and establishment of a regulation.[20] Naturally, chemical hazards or toxic substances undefined in a system cannot be prevented, recognized, and controlled.[21] Thus, harmful and hazardous ‘not existing yet but upcoming chemicals (NE chemicals)’ should be pre-defined in advance for the risk assessment. However, the prediction of ‘not-existing’ is vague and indefinite. Fortunately, when a machine learns the structures and properties of known harmful chemicals and analyzes their relationship, the learned relationship can theoretically suggest a pattern of NE chemicals.[22] In other words, a part of hazard and toxic space can be defined by using molecular features (variables) of known chemicals (Figure 2). As ‘chemical space’ means which encompasses all possible small molecules [23], a hazard and toxic space means which encompasses all possible hazardous and toxic chemicals and was named. More desirably, if the definition is ideally achieved, it can be used for a preventive regulation. With this consideration, we have tried to define a part of hazard and toxic space using cholinergic meta-predictors. In this study, the space of pan-cholinergic agents is priori defined by their molecular structures and then the cholinergic pattern of nerve agents as CWAs in the space is learned by convolutional neural network (CNN). The former is the generation of cholinergic meta-predictors and the latter is the CWA detection based on the meta-predictors.

Design of Meta-Predictor for Cholinergic Pattern. For a predictive model, predictor variables and dependent variables are generally chosen (or selected after manipulation) from variables of raw data. However, there was not in common molecular property information between CWAs and known cholinergic agents, and toxicity index was rarely available.[1, 2, 16, 24] The available data on cholinergic agents were their structures and cholinergic activities (Figure 3 and Table S1). Meanwhile, the only common information about CWAs and harmful agents was molecular structure. Expectedly, linking between CWAs and cholinergic data didn’t produce any common variable. Thus, a practical problem was how to create a unified descriptor (predictor variable) of the chemicals from the limited data. To define a unified descriptor, an important property of hazard and toxic agents is their toxicity profile together with molecular mechanics to lead to rescue from toxicity. Notably, the in-depth mechanism of respective toxicity is not clear for most all agents and is very different from each other. In CWAs, some nerve agents show higher structural congenericity and relatively more distinct mechanisms based on acetylcholinesterase (AChE) rather than other CWAs such as blister agent, asphyxiants, choking (pulmonary damaging) agents, incapacitating agents, lachrymating agents, and vomit agents.[1-2, 25-26] It is well-known that nerve agents and organophosphorus inhibit AChE at cholinergic synapses, thereby inhibiting the degradation of acetylcholine (Figure 3A). Accumulation of the released acetylcholine, causes end-organ overstimulation, which is recognized as cholinergic crisis.[1]

Thus, the limited knowledge motivated us to investigate a hazard and toxic space in terms of their cholinergic effect on the nervous system (of Figure 3). Notably, the aim of this study was not only cholinergic DTI prediction of individual chemicals but also was the detection of CWA from NE chemicals using cholinergic patterns of known chemicals. For this purpose, we designed a meta-predictor to describe the patterns using the structure-activity relationship (SAR) of cholinergic agents (Figure 4). First of all, the biochemical activities of cholinergic agents were embedded together with the molecular descriptors for a machine to learn the SAR. Secondly, the experimental activity data of ChEMBL (a public database) disciplined the machines to judge the relationship between the five cholinergic targets and chemicals, which is called drug target interaction (DTI). The trained DTI models of Figure 4 (200 classifiers of four type machines, ten differently divided data, and five targets) were internally and externally validated to elucidate the binominal cholinergic patterns (active/inactive) of a chemical. Thirdly, the cholinergic pattern of known CWAs and NPSs as harmful agents were predicted by the 200 binary classifiers, and the predicted values were transformed into an array type data as shown in Figure 4. Finally, the predicted array data was used as meta-predictors to build the CWA detection model. Even if real cholinergic patterns of these harmful chemicals are unknown, a chemo-centric approach allowed us to infer the pattern. The chemo-centric approach means if two similar molecules are likely to possess similar properties, they can share biological targets or may show similar pharmacological profiles. [27-32] Notably, this study used only two types real data: chemical structures of all chemicals (ChEMBL, CWAs, and NPSs) and cholinergic activities of ChEMBL chemicals (Figure 3B).

Robust DTI Classification Models for Meta Prediction. To realize the designed meta-predictor, two types of 2D molecular fingerprints (FCFP, ECFP) captured molecular structures of all cholinergic agents.[33] These extended-connectivity and functional-class fingerprint are well-known molecular representations, which precisely describe molecular structure and functional groups (groups of atoms having their own characteristic properties) in a molecule, and show their competent performance in drug design and large-scale prediction.[33] Thus, ECFP and FCFP were used to describe the cholinergic SAR under machine learning (ML) algorithms of random forest (RF), support vector machine (SVM), decision tree (DT), and k-nearest neighbor (KNN).[34-36] The DTI were trained for each cholinergic targets of acetylcholinesterase (AChE), butyrylcholinesterase (BuChE), nicotinic acetylcholinesterase receptor (nAChR), muscarinic acetylcholinesterase receptor (mAChR), and vesicular acetylcholine transporter (VAChT).[37] Firstly, statistical performance for the nAChR classifier was evaluated (Table1 and Table S2). Expectedly, the receiver operating characteristic curve (ROC) plots of nAChR classifiers demonstrated the robust predictability irrespective of data division into training and test (Table S2 and Figure S2). When Area Under ROC (AUC) of test data was compared, RF, SVM, and KNN models (AUC: 0.961 – 0.998) produced AUC higher than DT (AUC: 0.739 – 0.889). Furthermore, we applied other statistical metrics including accuracy, F1 score, and Matthews correlation coefficient (MCC), which informative and truthful score in evaluating binary classifications than accuracy and F1 score. Notably, the MCC values of every model were reliable (Test: MCC ~ 0.438 – 0.978, Train: 0.474 – 0.956), and the MCC values of test sets were at par with those of train sets. Secondly, the learning of the mAChR dataset followed a similar pattern to nAChR models, along with AUC of 0.807-0.998 and MCC of 0.608-0.974 (Table1 and Table S3). The mAChR models produced slightly higher predictive performance than the nAChR models. The overall DT model presented a lower performance than RF, SVM and KNN models. Thirdly, BuChE models also showed reliable prediction performance with AUC of 0.771-1.000 and MCC of 0.420 – 0.986 and slightly lower than the classification models of nAChR and mAChR (Table1 and Table S5). Fourthly, we further analyzed the classification metrics from AChE models. Despite the large data size (n = 3,098), the classification performance revealed at par performance for AUC of 0.774-0.999 (Table1 and Table S4). Finally, VAChT models of the smallest dataset outperform those of nAChR, mAChR, AChE, and BuChE (Table1 and Table S6). To visualize the predictive power of the cholinergic DTI models, the best performing models were described by ensemble-AUC values (Figure5 and Table S7).

Table1: The classification performance of selected best model based on ensemble-AUC for train and test set.

Target	ML	AUC	MCC	ACC	F1-Score
*nAChR*	RF	0.994 (0.987)	0.918 (0.975)	0.959 (0.987)	0.959 (0.987)
	DT	0.845 (0.871)	0.678 (0.764)	0.836 (0.871)	0.824 (0.854)
	SVM	0.994 (0.989)	0.936 (0.978)	0.968 (0.989)	0.968 (0.989)
	KNN	0.741 (0.737)	0.551 (0.558)	0.741 (0.737)	0.791 (0.792)
*mAChR*	RF	0.997 (0.977)	0.952 (0.954)	0.976 (0.977)	0.976 (0.977)
	DT	0.841 (0.820)	0.673 (0.642)	0.837 (0.820)	0.834 (0.813)
	SVM	0.996 (0.981)	0.959 (0.962)	0.979 (0.981)	0.979 (0.981)
	KNN	0.992 (0.958)	0.911 (0.917)	0.956 (0.958)	0.955 (0.958)
*AChE*	RF	0.997 (0.981)	0.942 (0.962)	0.971 (0.981)	0.971 (0.981)
	DT	0.832 (0.789)	0.627 (0.597)	0.808 (0.789)	0.824 (0.813)
	SVM	0.996 (0.986)	0.943 (0.972)	0.971 (0.986)	0.972 (0.986)
	KNN	0.982 (0.818)	0.704 (0.683)	0.832 (0.818)	0.856 (0.846)
*BUChE*	RF	0.999 (0.973)	0.949 (0.948)	0.974 (0.973)	0.974 (0.973)
	DT	0.796 (0.773)	0.523 (0.566)	0.761 (0.773)	0.760 (0.799)
	SVM	0.995 (0.973)	0.961 (0.947)	0.980 (0.973)	0.980 (0.973)
	KNN	0.909 (0.667)	0.408 (0.447)	0.643 (0.667)	0.737 (0.750)
*VAChT*	RF	1.000 (0.911)	0.702 (0.915)	0.830 (0.956)	0.887 (0.957)
	DT	0.975 (0.944)	0.953 (0.934)	0.976 (0.967)	0.976 (0.966)
	SVM	0.998 (1.000)	0.991 (1.000)	0.995 (1.000)	0.991 (1.000)
	KNN	0.998 (0.956)	0.953 (0.934)	0.976 (0.967)	0.977 (0.967)

Abbreviations: ACC: Accuracy; MCC: Matthew’s Correlation Coefficient; RF: Random Forest; DT: Decision Tree; SVM: Support Vector Machine; KNN: K-Nearest Neighbor; nAChR: Nicotinic Acetylcholinesterase Receptor; mAChR: Muscarinic Acetylcholinesterase Receptor; AChE: Acetylcholinesterase Enzyme; BuChE: Butyrylcholinesterase Enzyme; VAChT: Vesicular Acetylcholine Transporter. The values in parenthesis are belongs to the test set. The best model was selected based on the ensemble-AUC (Table S7)

Multi-Tasking of Array Classifiers and Performance. The first tasking of the built array model is predicting cholinergic activities of ‘out-of-set (neither training nor test set)’ molecules on nAChR, mAChR, VAChT, AChE, and BUChE (Figure 4). For the purpose, every cholinergic DTI classifier was already validated in the prior section. Thus, CWAs and none CWAs consisting of NPSs and designer drugs [19] are out of ChEMBL data [38], neither training nor test data. Cholinergic patterns of the CWAs and none CWAs were predicted to play the role of meta-predictors for the second tasking. The second tasking of the array model is judging the chemical warfare likeness of ‘out-of-set’ molecules. For this purpose, the discrimination between CWAs and none CWAs was learned by a convolutional neural network algorithm. Despite the difference in data size, our meta-predictors have the same property of binary pixel array with MNIST hand-written data (28 x 28 pixel).[39] The common property made us benchmark the image-based learning of MNIST data, particularly, convolutional neural network (CNN). Firstly, our meta-predictors were converted to the 2D array of 5 × 4 shape for CNN learning. After the investigation, the architecture of Figure 6A (see also Figure S9) was chosen for the best learner. As our expectation, the 2D array reliably detected CWAs from large NPS data. During the learning along with the increased epoch, accuracy and loss values reached to their optimal values and retained the values (Figure 6B). With the encouraging results, we tried to adjust the data imbalance between CWAs and non-CWAs through over-sampling and under-sampling (the removal of data showing duplicated array values). As shown in Figure 7, when imbalanced native data (Model 01) was compared with balanced over-sampled data (Model 03), statistical metrics showed the deviation with a slight decrease but the area under precision-recall curve (AUPR) values of Figure 7A were still comparable between native (imbalanced) and over-sampled data (balanced) to prove that these statistical values did not simply result from data imbalances. Matthews correlation coefficient (MCC), F1-score, and accuracy (Figure 7B) also supported that the SMOTE (over-sampling) confirmed the ability able to find CWAs.[40] Furthermore, the two types sampling allowed us to evaluate 2D or 3D array classifiers of different shapes. When we re-shaped the 2D array from [50 × 4] to [40 x 5], the detection ability steeply decreased to show this data is sequential. Meanwhile, wen we converted the 2D array into 3D arrays, surprisingly, image-based learning of [10 x 5 x 4] shape improved AUPR, MCC, and F1-score of the worst ‘Model 04’ and decreased the performance gap between different data (Figure 7). When the 3D array was reshaped into [5 x 10 x 4], the improvement of these statistical values was also retained.

Based on the statistical validation of Figure 7 and Table S8, the array classifiers are ready for CWA detection from NE chemicals. Obviously, this predictive model for chemical threats under the chemo-centric assumption is arguable due to the available data and impossible experimental validation. However, such a trial is not never only one. For example, OECD also developed the QSAR model toolbox and has provided it for risk assessment. [10] Because current information on the mechanism of CWAs enriches in AchE and cholinergic effect, this study only described cholinergic patterns to detect chemical threats. In the future, if data is updated, this methodology is applicable for other pharmacological effects of known harmful chemicals such as brain monoacylglycerol (MAG) lipase activity and endocannabinoid degrading enzyme, fatty acid amide hydrolase (FAAH), which are recently reported toxicity mechanism of organophosphorus pesticides. [2,16] Even if the MAG and FAAH inhibition of the insecticides were reported, such a trial will be more feasible after updating the data (of MAG or FAAH agents) as much as those of cholinergic agents.

Despite extremely imbalanced data, the cholinergic pattern of CWAs was learned through array type meta-predictors to achieve acceptable predictive performance. Furthermore, the learning allows multi-tasking for a chemical: DTI prediction for five cholinergic targets under four ML algorithms and CWA detection under the CNN algorithm. While the former tasking was verified through the internal and external validation of the respective DTI classifier, the later tasking was validated using CWA and non-CWA. This preliminary study unfolds the research needs to predict chemical threats from NE chemicals and in the recent future.

Dataset collection and manipulation: Any machine learning inextricably relies on the structure and reported activity data. In recent years, the ChEMBL databases have become a primary source to retrieve chemical data for machine learning applications. Herein, the ChEMBL database version 24 [37] selected to retrieve the structural and property data of cholinergic agents (nAChR, mAChR, VAChT, AChE, and BUChE) with the MySQL query consisting of molecular structures (canonical smiles), activity ID, standard values of inhibitory activities with standard relation and standard unit (nanomolar), assay ID, and target ID. In addition, the molecular structures of CWAs and NPSs were collected from literature [1, 2, 19] and NPS-datahub [41]. Every manipulation of data (sorting, merging, cleaning of duplicated data, and binominalization) was conducted by the KNIME Analytic Platform [42]. The supplementary section describes the composition of chemicals in each target. In brief, a total number of 1818, 6944, 3098, 1382, 302, 95, 3126 chemicals belonging to nAChR, mAChR, AChE, BuChE, VAChT, CWA, NPSs were selected respectively.

MySQL query in ChEMBL DB:

Select x.molregno,canonical_smiles, activity_id,y.assay_id, standard_value, standard_relation, standard_units, i.tid, k.target_type, k.pref_name, k.organism From compound_structures x, activities y, assays i, target_dictionary k

Where x.molregno = y.molregno and y.assay_id = i.assay_id and i.tid = k.tid and k.tid = 10532 INTO outfile "chembl_target_BuChE.csv" fields terminated by ',' lines terminated by '/n';

Molecular descriptor generation: Eight 2D molecular fingerprints of every chemical data were generated with (1) two types, extended-connectivity fingerprint (ECFP) and functional-class fingerprint (FCFP) and (2) 4 different diameters (0, 2, 4, 6) under fixed 1024-bit vector size. Notably, ECFP captures precise atom properties (e.g. atomic number, charge, hydrogen count, etc.), whereas the FCFP captures functional (pharmacophoric) features (i.e. hydrogen donor/acceptor, polarity, aromaticity, etc.) of the atoms in a molecule. The CDK toolkit [43] was used for both fingerprint calculations. The generated fingerprints were split and combined with respective binominal activity values into an embedded data matrix for learning.

Building classification models and validation: Four machine learning algorithms (random forest, decision tree, support vector machine, and k-nearest neighbour) applied for the data matrix with 10 different random seed numbers to build a classification model in the classification and regression training (CARET) package of the R environment. Every model was internally and externally validated in the condition of a 70:30 division ratio between training and test and k-fold (k=10) cross-validation methods. In brief, in k-fold cross validation, the input data is randomly partitioned into k-equal size subsamples. One of the k subsamples is kept as validation data for testing the model, while the remaining k-1 subsamples are used as training data. This k-fold cross-validation procedure is then repeated k times (the folds), with each of the k subsamples used exactly once as the validation data.

Array classifier-CNN architecture: The built models generated meta-predictors (meta-data) of 200 binary bits (5 cholinergic targets x 4 machine learning methods x 10 seed numbers). The metadata was embedded through several shape array (of [50 x 4], [5 x 10 x 4], [10 x 5 x 4]). CNN model, which is composed of different layers of convolutional, pooling, flatten, and dense, was built with the hyperparameters of maximum of maximum of 100 epochs, a batch size of 32 and a learning rate of 0.01 with the Adam optimizer [44]. The EarlyStopping criteria was introduced to prevent the CNN models to be over-fitting and terminate the learning early. The ‘Softmax’ activation function was used to define the probability distribution of the chemical warfare likeness [45]. The learning performance and robustness were measured by accuracy and loss values as the epoch number increased.

Evaluation of predictive model: The performance of each models was evaluated using three classification metrics i.e. Matthews correlation coefficient (MCC), accuracy, the area under the receiver operating characteristic curve (auAUC) based on true positive (TP), true negative (TN), false positive (FP), false negative (FN). These metrics evaluate the statistical performance and robustness of built models.

Ethics approval and consent to participate:

Every author accepted ethical standards of a genuine research study.

Consent for publication:

Every author agreed with the submission to the Journal and authorship.

Supplementary:

Supplementary file is available.

Availability of data and materials:

Python code, and refined data will be available in GitHub.

https://github.com/college-of-pharmacy-gachon-university/Array_Classifier

Competing interests:

The authors confirm that this article content has no conflicts of interest.

Funding:

This study was supported by the Basic Science Research Program of the National Research Foundation of Korea (NRF), which is funded by the Ministry of Education, Science and Technology (No.: 2017R1E1A1A01076642, 2020R1I1A1A01074750).

Authors’ contributions:

M.K. conceived and designed the study. Under M. K.’s plan, C.K. and S.K. carried out all modeling & data work. M. K., C.K., and S. K. analyzed the data. S.A. assisted for building the CNN architecture. H.K. advised evaluation of CNN model and revised architecture. M.K. and S.K. wrote the manuscript and revised it. M.K. provided the molecular modeling lab and synthetic research work facility. All authors read and approved the final manuscript.

Acknowledgments:

The authors would like to thank Prof. Young Mi Yoon’s useful advising.

Chauhan, S., Chauhan, S., D’Cruz, R., Faruqi, S., Singh, K. K., Varma, S., et al. (2008). Chemical warfare agents. Environmental Toxicology and Pharmacology 26, 113–122. doi:10.1016/j.etap.2008.03.003.
Kim, K., Tsay, O. G., Atwood, D. A., and Churchill, D. G. (2011). Destruction and Detection of Chemical Warfare Agents. Chem. Rev. 111, 5345–5403. doi:10.1021/cr100193y.
Lin, T. J., Walter, F. G., Hung, D. Z., Tsai, J. L., Hu, S. C., Chang, J. S., et al. (2008). Epidemiology of organophosphate pesticide poisoning in Taiwan. Clinical Toxicology 46, 794–801. doi:10.1080/15563650801986695.
Ganesan, K., Raza, S., and Vijayaraghavan, R. (2010). Chemical warfare agents. J Pharm Bioall Sci 2, 166. doi:10.4103/0975-7406.68498.
Munro, N. (1994). Toxicity of the Organophosphate Chemical Warfare Agents GA, GB, and VX: Implications for Public Protection. Environmental Health Perspectives 102, 18–37. doi:10.1289/ehp.9410218.
14,000 estimated to have died from humidifier sanitizer scandal: study. Yonhap News Agency. Available at: https://en.yna.co.kr/view/AEN20200727006300315 [Accessed March 28, 2022].
Paek, D., Koh, Y., Park, D.-U., Cheong, H.-K., Do, K.-H., Lim, C.-M., et al. (2015). Nationwide Study of Humidifier Disinfectant Lung Injury in South Korea, 1994–2011. Incidence and Dose–Response Relationships. Annals ATS 12, 1813–1821. doi:10.1513/AnnalsATS.201504-221OC.
Herbicides, I. of M. (US) C. to R. the H. E. in V. V. of E. to (1994). History of the Controversy Over the Use of Herbicides. National Academies Press (US) Available at: https://www.ncbi.nlm.nih.gov/books/NBK236351/ [Accessed July 14, 2021].
Manikkam, M., Tracey, R., Guerrero-Bosagna, C., and Skinner, M. K. (2012). Dioxin (TCDD) Induces Epigenetic Transgenerational Inheritance of Adult Onset Disease and Sperm Epimutations. PLOS ONE 7, e46249. doi:10.1371/journal.pone.0046249.
Assessment of chemicals - OECD Available at: https://www.oecd.org/chemicalsafety/risk-assessment/ [Accessed March 28, 2022].
Risk management of chemicals - OECD Available at: https://www.oecd.org/chemicalsafety/risk-management/ [Accessed March 28, 2022].
Schmidt, C. W. (2016). TSCA 2.0: A New Era in Chemical Risk Management. Environmental Health Perspectives 124, A182–A186. doi:10.1289/ehp.124-A182.
Gharami, S., Aich, K., Das, S., Patra, L., and Mondal, T. K. (2019). Facile detection of organophosphorus nerve agent mimic (DCP) through a new quinoline-based ratiometric switch. New J. Chem. 43, 8627–8633. doi:10.1039/C9NJ02218J.
Agrawal, M., Sava Gallis, D. F., Greathouse, J. A., and Sholl, D. S. (2018). How Useful Are Common Simulants of Chemical Warfare Agents at Predicting Adsorption Behavior? J. Phys. Chem. C 122, 26061–26069. doi:10.1021/acs.jpcc.8b08856.
Mondloch, J. E., Katz, M. J., Isley III, W. C., Ghosh, P., Liao, P., Bury, W., et al. (2015). Destruction of chemical warfare agents using metal–organic frameworks. Nature Mater 14, 512–516. doi:10.1038/nmat4238.
Eddleston, M. (2019). Novel Clinical Toxicology and Pharmacology of Organophosphorus Insecticide Self-Poisoning. Annu. Rev. Pharmacol. Toxicol. 59, 341–360. doi:10.1146/annurev-pharmtox-010818-021842.
European Monitoring Centre for Drugs and Drug Addiction. (2015). New psychoactive substances in Europe: an update from the EU Early Warning System, March 2015. LU: Publications Office Available at: https://data.europa.eu/doi/10.2810/372415 [Accessed March 27, 2022].
European Monitoring Centre for Drugs and Drug Addiction. and European Police Office. (2016). 2016 EU drug markets report: in depth analysis. LU: Publications Office Available at: https://data.europa.eu/doi/10.2810/219411 [Accessed March 27, 2022].
Urbas, A., Schoenberger, T., Corbett, C., Lippa, K., Rudolphi, F., and Robien, W. (2018). NPS Data Hub: A web-based community driven analytical data repository for new psychoactive substances. Forensic Chemistry 9, 76–81. doi:10.1016/j.forc.2018.05.003.
Shafi, A., Berry, A. J., Sumnall, H., Wood, D. M., and Tracy, D. K. (2020). New psychoactive substances: a review and updates. Ther Adv Psychopharmacol 10, 2045125320967197. doi:10.1177/2045125320967197.
Chemical Network Algorithms for the Risk Assessment and Management of Chemical Threats - Fuller – 2012 - Angewandte Chemie International Edition - Wiley Online Library Available at: https://onlinelibrary.wiley.com/doi/10.1002/anie.201202210 [Accessed March 28, 2022].
Carbó-Dorca, R. Determination of unknown molecular properties in molecular spaces. J Math Chem 60, 353–359 (2022).
Dobson, C. M. (2004). Chemical space and biology. Nature 432, 824–828. doi:10.1038/nature03192.
https://www.opcw.org/chemical-weapons-convention
Casida, J. E. (2017). Organophosphorus Xenobiotic Toxicology. Annu. Rev. Pharmacol. Toxicol. 57, 309–327. doi:10.1146/annurev-pharmtox-010716-104926.
Picard, B., Chataigner, I., Maddaluno, J., and Legros, J. (2019). Introduction to chemical warfare agents, relevant simulants and modern neutralisation methods. 10.
Keiser, M. J., Roth, B. L., Armbruster, B. N., Ernsberger, P., Irwin, J. J., and Shoichet, B. K. (2007). Relating protein pharmacology by ligand chemistry. Nat Biotechnol 25, 197–206. doi:10.1038/nbt1284.
The OECD QSAR Toolbox used the chemocentric assumption - OECD Available at: https://www.oecd.org/chemicalsafety/risk-assessment/oecd-qsar-toolbox.htm [Accessed March 28, 2022].
Venkanna, A., Kwon, O. W., Afzal, S., Jang, C., Cho, K. H., Yadav, D. K., et al. (2017). Pharmacological use of a novel scaffold, anomeric N,N-diarylamino tetrahydropyran: molecular similarity search, chemocentric target profiling, and experimental evidence. Sci Rep 7, 12535. doi:10.1038/s41598-017-12082-3.
Kumar, S., Jang, C., Subedi, L., Kim, S. Y., and Kim, M. (2020). Repurposing of FDA approved ring systems through bi-directional target-ring system dual screening. Sci Rep 10, 21133. doi:10.1038/s41598-020-78077-9.
Lee, S.-H., Ahn, S., and Kim, M. (2020). Comparing a Query Compound with Drug Target Classes Using 3D-Chemical Similarity. International Journal of Molecular Sciences 21, 4208. doi:10.3390/ijms21124208.
Dhorma, L. P., Teli, M. K., Nangunuri, B. G., Venkanna, A., Ragam, R., Maturi, A., et al. (2022). Positioning of an unprecedented 1,5-oxaza spiroquinone scaffold into SMYD2 inhibitors in epigenetic space. European Journal of Medicinal Chemistry 227, 113880. doi:10.1016/j.ejmech.2021.113880.
Rogers, D., and Hahn, M. (2010). Extended-Connectivity Fingerprints. J. Chem. Inf. Model. 50, 742–754. doi:10.1021/ci100050t.
Kumar, S., and Kim, M. (2021). SMPLIP-Score: predicting ligand binding affinity from simple and interpretable on-the-fly interaction fingerprint pattern descriptors. Journal of Cheminformatics 13, 28. doi:10.1186/s13321-021-00507-1.
Lee, J., Kumar, S., Lee, S.-Y., Park, S. J., and Kim, M. (2019). Development of Predictive Models for Identifying Potential S100A9 Inhibitors Based on Machine Learning Methods. Frontiers in Chemistry 7. Available at: https://www.frontiersin.org/article/10.3389/fchem.2019.00779 [Accessed March 28, 2022].
Sadik, O., Land, W. H., Wanekaya, A. K., Uematsu, M., Embrechts, M. J., Wong, L., et al. (2004). Detection and Classification of Organophosphate Nerve Agent Simulants Using Support Vector Machines with Multiarray Sensors. J. Chem. Inf. Comput. Sci. 44, 499–507. doi:10.1021/ci034220i.
ChEMBL Database Available at: https://www.ebi.ac.uk/chembl/ [Accessed March 28, 2022].
Gaulton, A., Hersey, A., Nowotka, M., Bento, A.P., Chambers, J., Mendez, D., Mutowo, P., Atkinson, F., Bellis, L.J., Cibrián-Uhalte, E., Davies, M., Dedman, N., Karlsson, A., Magariños, M.P., Overington, J.P., Papadatos, G., Smit, I., Leach, A.R. (2017) The ChEMBL database in 2017. Nucleic Acids Res. 45(D1) D945-D954
Deng, L. (2012). The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6), 141–142]
Chawla, N. V., Bowyer, K. W. Hall, L. O. Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority over-Sampling Technique. J Artif Int Res 16(1), 321–357
https://, https://nps-datahub.com/
Berthold, Michael R., et al. KNIME-the Konstanz information miner: version 2.0 and beyond. AcM SIGKDD explorations Newsletter 11.1 (2009): 26–31.
Steinbeck et al. (2003). The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo- and Bioinformatics. J. Chem. Inf. Comput. Sci. 43(2): 493–500, doi:10.1021/ci025584y
Kingma, D. P., Ba, J. (2014). Adam: A Method for Stochastic Optimization. arXiv. doi.org/10.48550/arxiv.1412.6980.
Goodfellow, I., Bengio, Y. Courville, A. (2016). 6.2.2.3 Softmax Units for Multinoulli Output Distributions. Deep Learning. MIT Press. 180–184.

No competing interests reported.

SupplementaryFileRevised.docx

How to Predict Future Upcoming Harmful Agents: Array Type Meta-Predictor Based Classification

Archived Versions:

Version 1

Abstract

Figures

Introduction

Results And Discussion

Conclusion

Methods

Declarations

References

Additional Declarations

Supplementary Files

Archived Versions:

Version 1