Study Population and Eligibility Criteria
The data of this study was obtained from NPDS, which is the only real-time data repository of poisoning in the United States. American Association of Poison Control Centers (AAPCC) which maintains the NPDS, represents the 55 Poison Control Centers (PCCs). NPDS includes exposures to more than 400,000 substances that have been continuously reported by PCCs [14]. Notably, more than two million human exposure was reported to NPDS in 2019 [14]. Even though NPDS data does not contain all of the substance exposure in the country, every exposure reported to NPDS does not necessarily signify poisoning or toxicity. All metformin exposure cases reported to the NPDS between January 1, 2012, and December 31, 2017, were included in this study. However, we excluded those cases with missing data as well as duplicate ones. Institutional review board approval was not required for this study based on the Colorado Multiple Institutional Review Board on Human Subjects Protection standards. All methods were carried out in accordance with relevant guidelines and regulations.
Definition Of Terms
To develop our classification model, we defined some important features based of the NPDS guidelines as follows:
Hypertension: Diastolic blood pressure greater than 90 mm Hg or systolic blood pressure greater than 140 mm Hg
Elevated anion gap: Result of the following equation more than 12 mEq/L: [Na+ - (Cl- + HCO3-)]
Elevated creatinine: Creatinine level more than 1.5 mg/dL or 133 µmol/L
Tachycardia: Heart rate more than 100 beats per minute
Renal failure: Acute and chronic renal failure that leads to clinically substantial loss of renal function and azotemia
Electrolyte abnormality: Imbalance level of sodium, potassium, bicarbonate, chloride, calcium, magnesium, and phosphate
Hypoglycemia: Glucose levels of less than 70 mg/dL or 3.9 mmol/L
Acidosis: Bicarbonate level less than 20 mEq/L, pH less than 7.35, or elevated levels of lactic acid
Development Of Classification Model
First, the dataset was randomly divided into training (70%) and testing (30%) datasets. The prediction model was developed by utilizing a train set and then incorporating the various variables, including demographic data (age, sex), the purpose of exposure (suicidal, unintentional, etc.), chronicity, clinical features, etc. Next, the test set was utilized to evaluate the model performance to see how well it fits the training set. Every decision tree comprises some nodes, including root and leaf nodes and branches. The root node denotes the most important feature, whereas the leaf node depicts a decision by applying some IF-THEN rules. For example, when moving down the decision tree's path, the right and left directions indicate false and true, respectively. The evaluation of the decision tree model was performed through F-1 score, specificity, recall, precision, accuracy, and confusion matrix. All of the analyses were done in Python using the Sklearn library.