Combined metabolomics and machine learning algorithms to explore metabolic biomarkers for diagnosis of acute myocardial ischemia

Acute myocardial ischemia (AMI) remains the leading cause of death worldwide, and the post-mortem diagnosis of AMI represents a current challenge for both clinical and forensic pathologists. In the present study, the untargeted metabolomics based on ultra-performance liquid chromatography combined with high-resolution mass spectrometry was applied to analyze serum metabolic signatures from AMI in a rat model (n = 10 per group). A total of 28 endogenous metabolites in serum were significantly altered in AMI group relative to control and sham groups. A set of machine learning algorithms, namely gradient tree boosting (GTB), support vector machine (SVM), random forest (RF), logistic regression (LR), and multilayer perceptron (MLP) models, was used to screen the more valuable metabolites from 28 metabolites to optimize the biomarker panel. The results showed that classification accuracy and performance of MLP model were better than other algorithms when the metabolites consisting of L-threonic acid, N-acetyl-L-cysteine, CMPF, glycocholic acid, L-tyrosine, cholic acid, and glycoursodeoxycholic acid. Finally, 17 blood samples from autopsy cases were applied to validate the classification model’s value in human samples. The MLP model constructed based on rat dataset achieved accuracy of 88.23%, and ROC of 0.89 for predicting AMI type II in autopsy cases of sudden cardiac death. The results demonstrated that MLP model based on 7 molecular biomarkers had a good diagnostic performance for both AMI rats and autopsy-based blood samples. Thus, the combination of metabolomics and machine learning algorithms provides a novel strategy for AMI diagnosis.


Introduction
Acute myocardial ischemia (AMI) is the primary cause of sudden cardiac death (SCD), which remains a leading cause of morbidity and mortality worldwide [1,2]. Almost 85% of all the sudden deaths are due to cardiac causes, and many of them asymptomatic at risk of sudden death [3]. AMI can lead the patient to death very quickly, before the appearance of the necrosis at the histological level. Therefore, in majority of such cases, it is hard to find post-mortem specific structural anomalies (both macroscopic and microscopic) of the heart at autopsy with ordinary histological methods, resulting in the cause of death uncertain in practice of forensic pathology [4,5].
Nowadays, there is a lack of knowledge about the changes that occur shortly after starting the ischemia in humans due to SCD. Specific markers of early myocardial ischemia, such as fibronectin, C5b-9, troponins, myoglobin, and S100A1, have been effectively used in clinical practice, but lost specificity in human post-mortem samples [6][7][8]. The molecular autopsy should be considered as a part of the comprehensive medicolegal investigation in SCD cases without structural heart alterations in recent years [9,10]. However, only 40% of the SCD could be uncovered by molecular autopsy, and these explanations may be more about understanding the genetic correlation 1 3 of SCD [11,12]. Until now, there is no highly specific and sensitive "gold standard" for the diagnosis of AMI, and the post-mortem diagnosis of AMI represents a challenge for both clinical and forensic pathologists.
Metabolomics is a promising tool to improve current, single biomarker-based approaches by identifying metabolic biosignatures that embody global biochemical changes in disease [13,14]. In terms of AMI, alternations in metabolism are the most immediate molecular changes affected by myocardial ischemia, which can then lead to different deleterious consequences, such as arrhythmia, myocardial infarction, and heart failure [15]. In recent years, the metabolomics technology has been increasingly used to study and evaluate cardiovascular diseases and the discoveries of metabolic biomarkers and implications of their causal relationship to cardiovascular disease pathogenesis [16][17][18]. Khan A et al. identified that upregulated L-homocysteine sulfinic acid, cysteic acid, and carnitine can help in detecting the subjects who are at risk of developing AMI [19]. Lactate, glutamine, and glutamate, as an adjuvant diagnostic avenue of cardiac troponin for myocardial infarction, have been studied in infarcted myocardia and serum post myocardial ischemia [20]. Compared with myocardial tissues, the biomarkers for diagnosis should enter the serum from the tissues and the metabolites in serum were less affected by the different regions of the heart. Therefore, we infer that molecular biomarkers in serum could be used for the autopsy diagnosis of SCD induced by AMI.
In the present study, we performed ultra-performance liquid chromatography combined with high-resolution mass spectrometry (UPLC-HRMS) to explore the metabolic characteristics of AMI in the context of sudden death. The multivariate data analysis and a variety of machine learning algorithms, such as gradient tree boosting (GTB), support vector machines (SVM), random forest (RF), logistic regression (LR), and multilayer perceptron (MLP), were used to comprehensively extract the importance of disturbed metabolites and assess the performance of machine learning methods on classifying AMI metabolomics data.

Chemicals
HPLC-grade acetonitrile (ACN), methanol, and formic acid purchased from Sigma-Aldrich (St. Louis, MO, USA). Deionized water purified through a Milli-Q® purification system from Merck (Millipore, Bedford, MA, USA). Other chemicals, reagents, and solvents used were all of the analytical grades.

Rat experimental protocol
All animal experiments were performed in accordance with the applicable Chinese legislation and approved by the Ethics Committee of Shanxi Medical University, PR China. Sprague-Dawley (SD) rats, weighing 180-220 g, 10-12 weeks old, were supplied by Animal Center of Shanxi Medical University. The rats were housed in cages with rat chow and water under a 12-h light-dark cycle at room temperature (22 to 24 °C) and were fasted overnight before the experiment.
All animals were randomly divided into 3 groups (n = 10 per group): control, sham, and AMI. The rat model of AMI was established according to conventional coronary ligation [21]. Briefly, rats were anesthetized with intraperitoneal administration of 3% pentobarbital sodium (30 mg/kg) and the lead II electrocardiogram (ECG) was monitored using a BL-420 biological functional experimental system (Chengdu Technology & Market Co. Ltd, China). Then, rats were endotracheal intubated and ventilated with a small animal ventilator (HX-100E, Chengdu Technology & Market Co., Ltd, China). A left thoracotomy was performed and the heart was exposed. The left coronary artery was ligated approximately 5 mm from the lower margin of the left auricle. The left ventricular apex of myocardium became pale and ST segment of ECG was elevated, which indicated that occlusion of coronary artery successfully induced myocardial ischemia. After 1 h of myocardial ischemia, the animals were euthanized by lethal dose of sodium pentobarbital. Sham-operated rats underwent a similar process without ligation of the left coronary artery. The control group received no treatment.
Blood samples were withdrawn from the rat's abdominal aorta and centrifuged at 12,000 rpm for 15 min at 4 °C. The supernatant serum samples were aliquoted and immediately stored at − 80 °C until analysis.

Human blood samples collection by forensic autopsy
This study was approved by the Ethics Committee of Shanxi Medical University, PR China, and all samples were analyzed anonymously. A total of 17 blood samples was collected from right ventricle in the forensic autopsy cases, in which 9 cases were confirmed of cardiac cause of death with AMI type II and 8 of noncardiac sudden death in the autopsy ( Table 1). The heart blood samples were collected and analyzed by UPLC-HRMS according to the same metabolomics protocol of rat samples. The

3
causes of death of these forensic cases were determined by professional forensic pathologists through systematic forensic autopsies (including macromorphological, histological, toxicological, and biochemical examinations) in combination with the death circumstances and medical history of the victims. The forensic autopsies were performed by the Department of Forensic Pathology, Shanxi Medical University. Written informed consent statements were acquired from the family member of the deceased individuals.

Sample preparation
Serum samples were thawed before extraction. Then, 800 μL of cold acetonitrile was added into 200 μL of serum to remove protein. After vortex mixing for 1 min and centrifugation (12,000 rpm, 20 min, at 4 °C), 600 µL of the supernatant was withdrawn and freeze-dried in a freeze concentration centrifugal dryer (NingBo XinZhi. Ltd, China). Finally, the residues were dissolved with 200 µL acetonitrile/ water (4:1) solution and filtered by 0.22 µm membrane for UPLC-HRMS analysis. A quality control (QC) sample was prepared by pooling and mixing equal-volume sub-aliquots of all samples to monitor the stability of analytical method and system.
The critical parameters of mass spectrometry detection were performed as follows: capillary temperature was 350 °C and spray voltages were 3.5 kV and 3.0 kV for positive ion mode and negative ion mode, respectively. The mass scan range was from 80 to 1200 Da. Scanning mode is Full Scan/dd-MS 2 , and the mass resolution was set to 70,000. The resolution is MS Full Scan 35,000 full width half maximum (FWHM), MS/MS 17,500 FWHM, normalized collisional energy (NCE) values were 12.5, 25, and 37.5 eV.

Data preprocessing
The acquired raw data files (.raw) were imported into Compound Discoverer 3.0 (Thermo Fisher, CA, USA) for initial data processing, including peak integration, nonlinear retention time alignment, filtering, and matching (mass tolerance 5 ppm). Simultaneously, the compounds in the serum were annotated. These metabolic discoveries were achieved by a combination of open online databases (mzCloud, Chem-Spider, PubChem and Human Metabolome Database), using exact mass and MS/MS spectra to improve the accuracy of metabolite identification. The final output data includes compound name, retention time, exact mass-to-charge ratio, peak area, etc. All data were imported into Excel for normalization by the total sum of all compounds' peak area in that sample, which are also present in all other samples in the experiment.

Statistical analysis of UPLC-HRMS data
All normalized metabolomic data matrices were imported into SIMCA-P14.0 software (Umetrics, Malmö, Sweden) and multivariate data analysis was carried out. According to the previous study [14], principal component analysis (PCA) was used to observe general clusters and outliers. Subsequently, the data were subjected to partial least squares-discriminant analysis (PLS-DA) and orthogonal partial least squares-discriminant analysis (OPLS-DA) where models were built and utilized to identify and reveal differential metabolites accountable for the separation between identified groups. Simultaneously, 200 times response permutation testing and P-values (P-value < 0.05) from cross-validated residuals analysis of variance (CV-ANOVA) were performed to evaluate the quality of PLS-DA model. Furthermore, Mann-Whitney U-test (P < 0.05) was used to evaluate the differences of metabolites using SPSS 24.0 software (IBM Corp., Armonk, NY, USA). The potential metabolites were selected according to their corresponding variable importance in the projection (VIP) values of these OPLS-DA models and P value of Mann-Whitney U-test.

Machine learning algorithms and biomarker candidates selection
To screen more important biomarkers and establish the best mathematical classification model for AMI diagnosis, we adapted a representative set of 5 machine learning algorithms that were applied widely in metabolomics: GTB, SVM, RF, LR, and MLP. Before analysis, Z-score data standardization was used to reduce sample variation: where X is the peak area of each metabolite, μ is the average peak area from each group, X-μ is the mean deviation, and σ is the standard deviation.
Python software (Intel Corporation, Santa Clara, CA, USA) was employed to develop mathematical models and tune the parameters based on the 5 machine learning algorithms. The essential metabolites were selected and ranked based on their contribution to each model. Borda count algorithm was applied to summarize all 5 ranks in order to obtain the final importance rank of metabolites [22]. Tenfold cross-validation method and average values of area under the curve (AUC) in multivariate receiver operating characteristic (ROC) curve were used to screen the highest performance classification model, and the metabolites in this model were known as the biomarker candidates. The boxplots of biomarkers were prepared using GraphPad Prism7 (GraphPad Software, La Jolla, CA, USA).

Predictive model construction and performance assessment
The best performing model was used as a predicting model for AMI. We randomly split all samples into 70% training set and 30% testing set to assess the overall performance of the model. The 70/30 split is a common practice of splitting ratio for samples of a moderate size in the machine learning applications. Predictive power was assessed by confusion matrices and ROC curves associated with AUC values [23]. Additionally, the metabolomics data of the autopsy cases were used for the external validation set to evaluate the performance of the predicting model for AMI.

Animal model of AMI
We established the AMI animal models by ligation of left coronary artery at 5 mm below the left atrial appendage in rats. During the experiment, we observed a marked elevation in the ST segment of the electrocardiogram after ligation (Supplementary Fig. S1) and myocardium under the ligature went pale, verifying the success of ligation and the occurrence of myocardial ischemia.

Cardiac gross and histopathology changes of SCD
For 9 cases of SCD (Table 1), there was obvious thickening of the coronary artery wall and stenosis of the lumen in both autopsy and histopathological examination (Figs. 1a  and 1b). Gray-white scar tissue could be seen in the myocardium of the region dominated by the narrow coronary artery (Fig. 1c), which was replaced by many fibrous tissues under the microscope (Fig. 1d). Pathological manifestations of early myocardial infarction were not found in these cases, such as wave-like changes of cardiac muscle fibers, contraction band necrosis, hemorrhage, and even inflammatory cell infiltration.

Metabolomics statistical analysis of UPLC-HRMS data
As shown in Fig. 2a, QC samples were clustered together in PCA score plot, indicating satisfactory stability and reproducibility of the analysis platform. The PLS-DA score plot (Fig. 2b) illustrated completely separations between the 3 groups with cumulative R 2 Y and Q 2 were 0.987 and 0.814 in the model. The results of the permutation test (200 times, Fig. 2c) and CV-ANOVA (P < 0.05) showed an adequate capacity for fitting and predicting of the model. Furthermore, the score plots of OPLS-DA models (control vs. sham, control vs. AMI, and sham vs. AMI) revealed that the separation was effective between groups (Figs. 2d, 2e, and 2f). However, the unbiased PCA results exerted a poor separation in 3 groups of metabolomics data (Fig. 2a), especially in control vs sham rats (Fig. 2g), but Figs. 2h and 2i showed good separations in control vs AMI rats and sham vs AMI rats. Based on these results, subsequent comparative analysis was done between ischemic samples and controls.
After eliminating the effects of surgery, 478 differential features with VIP > 1 and P-value < 0.05 were screened in the serum of AMI rats (red circle in Fig. 3a), which attributed to 28 endogenous metabolites. The remaining 450 features did not match any known compounds (Supplementary  Table S1). According to the heatmap (Fig. 3b), there are 22 metabolites up-regulated and 6 metabolites down-regulated compared with control group in serum from AMI rats ( Table 2).

Biomarker candidates and machine learning algorithms optimization for diagnosis of AMI
To extract more important metabolites, the data of 28 metabolites in rat serum were brought into models of GTB, SVM, RF, LR, and MLP algorithms. A metabolite is considered vital if it contributes to the model performance. The metabolites were ranked according to their functional contributions to each model's outputs, respectively (Figs. 4a-e). In each model, every metabolite was assigned a numerical score, which indicated its contribution: smaller value represents a more extensive contribution of metabolite to a given model. Lastly, Borda count algorithm was applied to summarize all ranks derived from models and the final importance rank of metabolites was shown in the right column of Fig. 4f.
The new datasets from different groups of metabolites, which were generated by removing one by one according to the metabolites' ranks, were used to build prediction model. According to the tenfold cross-validation of 5  Table 1). a Macroscopic view of coronary artery. A transverse cut of the left anterior descending coronary artery appears coronary artery wall thickening and lumen stenosis. b HE staining showing the coronary artery intima irregular thickening, atheromatous plaques formation, and lumen stenosis grade was IV. c Macroscopic view of acute left-ventricular myocardial infarction. Gray scar tissue in infarction area of left ventricular anterior wall can be observed (arrow). d Microscopic appearance of myocardial infarction with extensive replacement fibrosis 1 3 machine-learning methods, the average AUCs of ROC curve analysis were 0.72 for SVM, 0.82 for GTB, 0.76 for RF, 0.82 for LR, 0.98 for MLP, and the accuracy of MLP models achieved 96.67% for the diagnosis of AMI when the groups of metabolites consisting of L-threonic acid, N-acetyl-Lcysteine, CMPF, glycocholic acid, L-tyrosine, cholic acid, and glycoursodeoxycholic acid (Figs. 4g and 4h). The representative chromatograms of 7 different metabolites in the control, sham, and AMI groups were shown in Supplementary Fig. S2.
The above results indicated that the performance of MLP model consisting of these metabolites was better than other models and the boxplots of normalized intensities for the 7 potential biomarkers were shown in Fig. 5. test (200 times) shows that no over-fitting of PLS-DA. d-f OPLS-DA score plots of control and sham groups, control and AMI groups, and sham and AMI groups. g-i PCA score plots of control and sham groups, control and AMI groups, and sham and AMI groups Fig. 3 Venn diagram and heatmap plot from the comparisons of AMI, sham, and control groups. a Venn diagram of differential features. The number in the overlapping area (encircled in red) represents differential features between AMI and the other two groups, excluding the effect of operation. b Heatmap plot of the 28 potential metabolites. Red indicates a higher level and blue indicates a lower level Table 2 List of the potential metabolites associated with AMI in rat serum a Variable importance in the projection (VIP) value was obtained from OPLS-DA with a threshold of 1.0 b P-values were derived from Mann-Whitney U-test: *P < 0.05, **P < 0.01, ***P < 0.001 Marked with ↑ indicated that the level of metabolites from AMI increased compared with the control group, while marked with ↓ The levels of 3 of bile acids (cholic acid, glycocholic acid, and glycoursodeoxycholic acid) were both downregulated in AMI rat serum. The levels of 2 amino acid (threonic acid, N-acetyl-L-cysteine) were up-regulated and L-tyrosine were down-regulated. CMPF was up-regulated in AMI serum.

Validation of the classification model for AMI diagnosis
To better estimate the generalization error, train the model parameters, and avoid overfitting, validation was performed in 2 different ways. First, all rat serum samples were randomly split into 70% and 30% testing set. The 30% testing set results showed that the accuracy of the model was 83.33% and the AUC value of ROC curve was 0.88 (Fig. 6a). According to the confusion matrix, only one sample was incorrectly assigned into a control instead of AMI group (Fig. 6b).
Then, the validation was performed using human serum samples obtained from SCD autopsy cases with AMI type II to assess if the combination of the selected 7 metabolites and MLP model would also discriminate human AMI-SCD cases from controls. The normalized peak area of the 7 metabolites (L-threonic acid, N-acetyl-L-cysteine, CMPF, glycocholic acid, L-tyrosine, cholic acid, and glycoursodeoxycholic acid) was imported into the constructed MLP model. As a result, the accuracy of the MLP model constructed based on rat datasets in autopsy-based blood samples was 88.23%, and the AUC value of ROC was 0.89 (Fig. 6c). According to the confusion matrix shown in Fig. 6d, only two samples were misjudged into AMI-SCD. The results demonstrated that the MLP model based on the rat metabolomics data achieved a better performance in human samples, and it is a more suitable classifier than other machine learning algorithms for AMI diagnosis.

Discussion
The present study focuses on ischemic heart disease, which represents the most frequent cause of sudden cardiac death in most countries. More remarkably, it is difficult to diagnose in patients who died within 6 h after the onset of myocardial ischemia because of lacking the specific structural anomalies of the heart based on the current methods of histological examination at autopsy. Therefore, the post-mortem diagnosis of AMI represents a current challenge for both clinical and forensic pathologists.
The heart has a high metabolic rate to fulfill the demand for adenosine triphosphate (ATP) production to sustain the continual contractile activity. In many cardiovascular diseases, the heart undergoes a "metabolic shift", which means the categories or concentrations of metabolites such as fatty acids, glucose, ketone bodies, lactate, and amino acids had been changed [24,25]. In this study, we applied a metabolomics protocol to investigate the effects of AMI on the metabolites and identified early biomarkers of AMI that can distinguish between AMI and control rats. Compared with the other published AMI metabolomics studies mainly engaged in clinical diagnosis or treatment [24][25][26][27], the present study screened the panel of serum metabolites in a model of AMI to diagnose human post-mortem cases with AMI type II, and these subjects could have nonetheless experienced a sudden acute ante-mortem ischemia just prior to death in addition to having occlusive coronary atherosclerotic plaques.
In the present study, a total of 28 endogenous metabolites were identified using untargeted metabolomics in rat serum after the AMI event. These metabolites were mainly involved in amino acids, bile acids, glucose, fatty acids, etc. However, although the types of the 28 differential metabolites are roughly the same with prior AMI metabolomics studies [19][20][21][24][25][26][27], there were fewer identical metabolites in most of these studies. The variation in metabolites was mainly due to limitations of current analytical approaches in metabolomics, such as the bias of different instrument platform and lacking a systematic method for metabolite extraction [28,29], which indicated that a standard method for metabolomic analysis needs to be developed which could be applied in different laboratories. In addition, the variation may also be caused by the different experimental animal models and species differences, including rats, dogs, pigs, and humans with coronary disease [20,[24][25][26][27].
As described previously, the diagnosis of cause in the first 6 h after onset of myocardial ischemia is a great challenge. Coupling with the development in instrumentations, state-of-the-art data analysis tools are needed to handle a large amount of generated metabolite data in untargeted metabolomics. Machine learning algorithms that represent potent tools for metabolomics analysis have been increasingly applied to the classification and data mining of complex UPLC-HRMS data [30]. More recently, the deep learning method or deep belief network (MDBN) in metabolomics has shown the advantages of classifying the Fig. 4 List of the most important metabolites by machine learning algorithms. a-e Histogram of ranking metabolites in support vector machine (SVM), gradient tree boosting (GTB), random forest (RF), logistic regression (LR), and multilayer perceptron (MLP). f Heatmap of metabolites' order obtained by the Borda count algorithm. Accuracy heatmap of machine learning algorithms. Dark blue color indicates a high contribution to the model and light yellow indicates a low contribution. g The average AUCs of different machine learning algorithms. h The classification accuracy of the different metabolites' groups and machine learning algorithms for the diagnosis of AMI ◂ 1 3 different diseases, such as breast cancer status and diagnosing breast hyperplasia [31,32]. And some reports proved that machine learning algorithms such as support vector machine, genetic algorithm, and random forest could be used to classify metabolomic data analysis and provide a high accuracy rate in diagnosing disease [31][32][33]. ine, f cholic acid, and g glycoursodeoxycholic acid. * represents P < 0.05 compared with AMI group, ** represents P < 0.01 compared with AMI group, *** represents P < 0.001 compared with AMI group In this experiment, we investigated the accuracy and efficacy of different metabolite combinations in inferring AMI using 5 machine learning algorithms. The results showed that the increment of the classification accuracy ranges from 53.33 (SVM, 25 metabolites) to 96.67% (MLP, 7 metabolites) and illustrated that AMI diagnosis could be improved by comparing different small molecule combinations of metabolites and employing appropriate machine learning algorithms (Fig. 4). To validate the model, MLP model was used to classify the 17 forensic autopsy-based blood samples and resulted in accuracy of 88.23% and ROC of 0.89. These results demonstrated that the 7 metabolites might be the biomarkers for clinical diagnosis or post-mortem identification. To the best of our knowledge, this is the first study to assess the capability of classification model based on untargeted metabolomics from well-controlled animal model to predict cause of death in autopsy cases.
To further understand the relationship between the 7 metabolites and acute myocardial ischemia, we performed a literature review focused on "metabolic shift". Previous studies have reported a relationship between serum bile acid and cardiovascular conditions [34]. Zhang BC et al. found that higher serum total bile acid level was an independent predictor of high-risk coronary plaques in asymptomatic individuals [35], which indicated that serum bile acid level might forecast the AMI and the occurrence of SCD. Among the 7 essential metabolites in our study, there are 3 bile acids (cholic acid, glycocholic acid, and glycoursodeoxycholic acid) changes in serum of AMI. These results showed that the occurrence of myocardial ischemia leads to abnormal lipid metabolism, especially bile acid biosynthesis and metabolism. Sun L et al. found that the level of threonic acid was markedly down-regulated in the isoproterenol (ISO)induced AMI rats and it was considered one of the most associated cecal metabolites with AMI [36]. Nevertheless, the serum level of threonic acid in this study is up-regulated, which may be caused by increased intestinal reabsorption into the blood circulation coupled with oxidative stress response. Thus, the results further confirmed that threonic acid is an important biomarker associated with AMI.
In the present study, the combination of metabolomics and machine learning algorithms showed great potential for diagnosing AMI. The "metabolic shift" could be the molecular autopsy indicator for medicolegal investigation in SCD cases. Our research provides a basis for the translational research from bench to daily practice for the diagnosis of acute myocardial ischemia. In the future, the standardization methods of metabolomics applied for cadaveric samples should be studied and the validation of diagnostic models based on the machine learning algorithms for SCD under different causes of death by increasing the sample size of humans.

Conclusions
In summary, there were 28 different metabolites identified by the metabolomics analysis from AMI rats serums. A panel of 7 valuable metabolites among 28 was screened by multiple machine learning algorithms and validated in autopsy cases. The constructed MLP classification model based on metabolomics data has a high diagnostic performance for AMI rats and the forensic autopsy-based blood samples. Therefore, combining the metabolomics approach and machine learning algorithms to establish diagnostic models has been demonstrated as a potentially helpful strategy for diagnosing AMI and SCD.