Articial Neural Network-Based Predicting the Risk of Complicating Ventricular Tachyarrhythmia after Acute Myocardial Infarction During Hospitalization

An articial neural network (ANN) model was developed to predict the risks of complicating ventricular tachyarrhythmia (VTA) in patients with acute myocardial infarction (AMI). We enrolled information of 503 patients with 13 risk factors from the aliated hospital of Guangdong medical university from January 2017 to December 2019. Risk factors were dimensionally reduced and simplied as new variables by principal component analysis (PCA). The cohort were randomly divided into a training set and a testing set at the ratio of 70%:30%. Training set was used to develop a model for the prediction of VTA while testing set was used to evaluate the performance of the model. Three new comprehensive variables by PCA are able to reect all information of the original data. We determined the prediction model with optimizing parameters by cyclic searching which includes an input layer of three comprehensive variables, a single hidden layer composed of two neurons and a output layer. The area under curve (AUC) is 0.812 in training set and confusion matrix with accuracy 94.60%, sensitivity 63.04%, specicity 99.35%, positive predicative value 93.55%, negative predictive value 94.70%. The model displayed a decreased but medium discrimination with an AUC of 0.688 in the independent testing cohort, confusion matrix with accuracy 87.42%, Sensitivity 39.26%, specicity 98.37%, positive predicative value 84.62%, negative predictive value 87.68%. The research suggests that ANN model could be used to predict the risk of complicating ventricular tachyarrhythmia after acute myocardial infarction while should be further improved.


Introduction
Ventricular tachyarrhythmia (VTA) is a common but fatal complication of acute myocardial infarction (AMI), especially in elderly patients 1 . It is also the main cause of sudden cardiac death(SCD) and seizes for 4-5 million lives per year globally 2 . According to reports of the American Heart Association in 2015, the survival to hospital discharge after suffering SCD was estimated to be 23.8%-35.9% 3 . Considering the extremely poor prognosis, it is of great signi cance to identify the high-risk patients in early stage.
In the booming era of big data and arti cial intelligence, machine learning has become a useful and popular scienti c research method in data analysis and prediction 4 . Arti cial neural network (ANN) is a kind of machine learning methods, which copies the biologic neural system sample and consists of an input layer, signal or multiple hidden layers and an output layer. The algorithm has abilities to analysis data in nonlinear relationship and not necessitate to notice the distributional assumptions (such as normality).These advantages have caused considerable interests in medical research and used to establish disease prediction models in cardiology, sleep medicine and oncology [5][6][7][8] .The purpose of our study is to construct a prediction model based on cardiovascular risk factors by ANN to identify the highrisk patients that may complicate of VTA after AMI.

Study design and data collection
The ethical review was approved by the ethics committee of the a liated hospital of Guangdong medical university and patient informed consent was waived for this retrospective analysis.
We enrolled the information of the patients from the electronic medical record database from January 2017 to December 2019 who were clearly diagnosed as AMI, ful lling the Fourth Universal De nition of Myocardial Infarction (2018) 9 . Arrhythmic events were recorded by reviewing electrocardiogram or Holter monitor. VTA were de ned as sustained ventricular tachycardia, ventricular brillation that resulting in de brillator shocks and non-sustained ventricular tachycardia. Features and subclassi cation about risk factors are listed in Table-1:

Statistical analysis and Model established
Statistical analysis and model construct were performed by R3.6.1 software. R packages "psych", "ggplot2", and "pheatmap" were used to execute and visualize the principal component analysis (PCA).
The package "neuralnet", "NeuralNetTools", "dplyr" and " pROC " were used to develop and validate arti cial neural network model. The process of analysis is as follows steps: 2.2.1. Extracting the features from the original variables by principal component analysis (PCA) which is a multivariate statistical method with a long history and a widely used range. The principal components can re ect mostly or all information of the original data while each variable is independent from others, avoiding multiple collinearity and helpful to develop a model 10 .

2.2.2.
The cohort were randomly divided into a training set and a testing set at the ratio of 70%:30%. A standard feed-forward, back-propagation neural network is the simplest form of ANNs that consisting of an input layer, a hidden layer, and an output layer was applied in the study due to its relative simplicity and stability 11 . The operation process of the model is as follows: the new comprehensive principal components were introduced from input layer to the hidden layer, which consists of several neurons as information receiver. All the neuron connections have a different weights and bias parameter. The former one represents the importance of the corresponding input compared with other inputs, the latter one is used to correct the calculation results of the weight and input. The information is transformed to nonlinearly by the sigmoid activation function and passed into output layer that calculates results whether complicating of VTA. It should be noted that the optimal number of neurons in hidden layer was determined through trial and error, since no accepted theory currently exists for predetermining the optimal number. We use cycle searching to determine the optimal number of neurons of the model in the study. The mathematical operations in the model can be generalized as follows 12 : Note: y = output result, i = number of input variables, N = number of neurons, w = weights, x = input variables, b = bias parameter 2.2.3. The optimized model was veri ed in the training dataset and testing dataset respectively, with following parameters as the assessment tool: area under receiver operating characteristic curve (AUC) 13 and confusion matrix with accuracy, sensitivity, speci city, positive predicative value, and negative predictive value.

Characteristics of clinical information
All data records were expressed as count (%). A total of 503 patients with 13 risk factors, according to the presence or absence of the complications of VTA, were divided into the VTA group and the non-VTA group. The detailed information is shown in Table 2.

Principal component analysis
The risk factors were simpli ed into three new principal components by PCA dimensionality reduction that include all information of the original data re ect by cumulative variance and visualization as scree plot (Figure-1a). The score values of risk factors that re ected by the new variables are shown as the clustering heat map (Figure-1b).

Model construction and parameter adjustment
We test the parameters by comparing the prediction accuracy in different neuron models and draw a conclusion that the model contains two neurons is suitable with the best accuracy reaching 87.42% ( Figure-2a). Finally, We conform that the optimal arti cial neural network model which includes a input layer with three variables, a single hidden layer composed of two neurons and a output layer to display the result (Figure-2b).

Evaluation of model performance
The Receiver operating characteristic (ROC) curves showed that the result of model in training dataset had a promising discrimination with AUC of 0.812 (Figure-3A). Confusion matrix was show as Table-3 with Accuracy 94.60%, sensitivity 63.04%, speci city 99.35%, positive predicative value 93.55%, negative predictive value 94.70%. In the independent testing cohort, the model displayed a decreased but medium discrimination with an AUC of 0.688 (Figure-3B), Confusion matrix was show as

Discusstion
VTA is a common but fatal complication of AMI. Although there are some electrocardiographic and imaging abnormalities, such as EF < 35%, long Q-T interval and R on T wave, may have a close relationship with VTA 14,15 . Unfortunately, there is no generally accepted prediction model to identify highrisk patients accompanied with ventricle tachyarrhythmia after acute myocardial infarction. Our team attempts to construct a prediction model and make a contribution to this eld.
The arti cial neural network is a representative method of arti cial intelligence and an ideal disease prediction model for being able to analyze data in nonlinear relationship and to predict a complex relationship between variables. The earliest application of ANN in cardiology dates back to at least 1995, with the development of computer technology and the widely application of deep learning (A new research direction in arti cial intelligence).Nowadays, ANN has attracted more and more attention again 16,17 .
The suitable feature variables form the basement to the model. PCA is a multivariate statistical method by transforming multiple variables into a few less new comprehensive variables with orthogonal transformation to achieve the purpose of reducing dimension and to simplify data structure. The ANN model started from three principal components that simplify from 13 risk factors by PCA. The cumulative variance of the three principal components add up to 100% that re ects the information of the original data perfectly. We also had tried to screen feature variables by least absolute shrinkage and selection operator (LASSO) regression 18 . The result selected by Lasso are infection, eGFR, lesion vessel, hsTnT, EF, PCI timing that similar to PC1, but the rest information was missing. We prefer to screen the feature variables by PCA for the fewer variables but more comprehensive information than lasso regression.
The results of the prediction indexes in training set and testing set are considerable except for the sensitivity. According to the de nition as sensitivity shown by mathematical: The reason for unsatis ed sensitivity mainly because of poor true positive predictive value that caused by the imbalanced data proportional, the VTA group with 74 patients counts for 14.71% while the non-VTA group with 429 patients counts for 85.29%. In the process of prediction, the model tends to judge the patients as non-VTA group because of the higher probability accuracy. Although we have preferential model parameters and take some measures to deal with unbalanced data, such as Synthetic Minority Oversampling Technique (SMOTE) 19 , but the effect was rarely. Although the high conservatism of the model to true positive results is in low sensitivity, it does not mean that the model is a failure. Once the patients were judged as true positive (VTA) by the model, the reliability (positive prediction rate) was credible, which reached 93.55% in the training set and 84.62% in the test set. This suggests that doctors need to pay more attention to these really high-risk patients.

Limitation
There are several limitations in the study. First of all, risk factors mainly focus on traditional clinical diagnosis and treatment for we lack of information about electrophysiological. Secondly, training and testing samples original from the same cohort. The prediction performance of the model has not been veri ed in other populations. Finally, the application of ANN model is not as convenient as nomogram model, because it depends on the speci c computer program.

Conclusion
Despite of the limitations, our research introduced the method of machine learning that applied to classi cation into cardiovascular medicine and have developed an ANN model to predict the risk of complicating VTA after AMI. It is necessary to apply the interdisciplinary guiding ideology to medical research under the background of the rapid development in arti cial intelligence area.