High-performance Cardiovascular Medicine: Artificial Intelligence for Coronary Artery Disease


 Cardiovascular disease (CVD) is identified as the leading cause of death globally, according to the World Health Organization (WHO). Approximately 17.9 million people are dying due to cardiovascular disease, which is an estimation of 31% of all deaths worldwide. CVDs are generally affecting the heart and blood vessels in the human body. Since healthcare is an essential factor for a country and its economy, researchers are looking for solutions to predict disease before getting into serious problems. This research introduces a method to development of an algorithm to predict coronary artery disease based on artificial intelligence. The algorithm was tested with 72 random subjects, which covered 11 attributes such as age, gender, height, weight, systolic and diastolic blood pressure, cholesterol, glucose, smoking, alcohol intake, and physical activities. According to the results, the prediction accuracy of the system was 81.62% at 0.879 precision.


Introduction
Cardiovascular disease (CVD) or Cardiac disease is identified as a significant cause of death in both developing and developed countries (Amani and Sharifi 2012). CVD is a class of diseases of the heart or the blood vessels, which includes disorders of the Coronary Artery Disease (CADs), such as angina and myocardial infarction (Murray 2014). There are many associations with CVDs; many of them are related to the atherosclerosis process. Also, CVDs include heart attack, cardiac disease, rheumatic heart disease, cardiomyopathy, irregular heart rhythms, congenital heart defect, and dysfunction of the heart valve, carditis, aortic aneurysms, peripheral artery disease, thromboembolic disease, and venous thrombosis (Lanzkowsky et al. 2016). Atherosclerosis is known as a condition that develops in the human body when fat and cholesterol plaque builds up in the walls of the arteries.
According to the World Health Organization (WHO), CVDs are the number one cause of death globally. Total global deaths due to heart diseases were recorded as 31% by the end of 2020 (www.who.int 2020). Institute for Health Metrics and Evaluation (IHME) of the United States of America (USA) published the Global Burden of Disease Study in 2017. According to the IHME study, deaths caused by cardiovascular diseases since 1990 to 2017 have increased by 5.85 million. Figure   1 illustrates the global death cases due to CVDs since 1990. Also, 80% of deaths occur due to stroke and heart attack (Hazra et al. 2017).  Table 1 depicts the latest WHO data, which was published in 2018 for death in Sri Lanka due to coronary heart disease. According to the data, 22.64% of deaths were recorded due to the CADs and rank to 94th place among the other countries. The most common cause of CVD morbidity and mortality is Ischaemic Heart Diseases (IHD). In 2005, 9.3% of IHD prevalence was found within Sri Lanka by age and sex (Jayawardene et al. 2017). Cardiovascular diseases or heart diseases mainly refer to Coronary Heart Disease (CHD), Rheumatic Heart Disease (RHD) and Cardiomyopathy (Lopez et al. 2020 (Awan et al. 2018). According to them, 94.7% accuracy observed by predicting heart disease using an artificial neural network method.
Rastogi, Chaturvedi, Satya and Arora have performed a research on development of a heart disease prediction based on the physical and mental parameters of the patients (Rastogi et al. 2020). They used features such as sex, cholesterol level, blood pressure, Tension-Type Headache (TTH) and stress level to predict heart diseases. They have also used different types of data mining methods to analyse the heart disease dataset. Support vector machine (SVM), Naive Bayes and Decision Trees were the major data mining techniques which they used to compare the results.
Higuchi, Sato, Makuuchi, Furuse, Takamoto and Takeda have tested and verified results for three layered artificial neural networks to predict the condition of the patient's heart. They used ANNs for analysing the phonocardiogram data to diagnose the patients. According to Higuchi and his team, the accuracy of the diagnoses can be improved by future data accumulation (Higuchi et al. 2006).
Maciej, Piotr, Jacek, Joseph, Jay and Georgia from the University of Louisville in USA have investigated the class imbalance due to the training of the dataset while developing the neural network system for the medical diagnosing computerized systems. According to their results, Backpropagation was more performable compared to the Particle Swarm Optimization (PSO) for imbalanced training data.
They also mentioned that this imbalance occurred due to the small data sample and the large number of features associated with the dataset (Mazurowski et al 2008).
The next section of this article describes the design methodology of the proposed algorithm.

Methodology
The ultimate goal of this research is to predict coronary artery disease using the artificial neural network-based approach. Cardiovascular disease dataset has 70,000 observations from medical examinations. The methodology of this study was comprised of five main steps, Training an artificial neural network using the back propagation Levenberg Marquardt (LM) algorithm.

5.
Perform the test run and analyses results using the data analysing method.

Description of Dataset
The "Kaggle cardiovascular dataset" is widely used by the researchers for predicting cardiac disease (Ulianova 2019). The dataset consists of 70,000 records in which information was given by the patients at the moment of medical examination. The dataset contains 12 features that include four objective features, four examination features, three subjective features, and one target variable for the presence or absence of cardiovascular disease. Table 2 depicts the features extracted from the dataset. Table 3 demonstrates the output features of the dataset.

Development of an Artificial Neural Network Algorithm
Due to our study associated with the multilayer perceptron type, the procedure of the teaching algorithm for MLP was performed as follows, • The structure of the neural network was defined by initializing the activation function, network parameters, weights, and biases. The transfer function was used to convert an input to the output signal of the model. This study was referred to as the hyperbolic tangent transfer function and Tansig function was used to activate the function of the network. This hyperbolic tangent transfer function was related to the sigmoid function, which gave output between -1 to 1. • The parameters associated with the training algorithm were defined. The algorithm was developed with the random data division and Levenberg Marquardt training algorithm. The performance was measured using the Mean Square Error (MSE) technique. The training algorithm was performed with 1000 iterations during the approximately eight minutes of time. The performance of the training algorithm was recorded as 0.178 rates and the gradient of the learning algorithm provides that steepest direction. The goals of the machine learning techniques are to reach an optimal solution, which is also known as the bottom of the bowl to give the direction of the maximum rate change. In this study, the gradient was recorded as 0.000322 within the range between 0.643 and 1.00×10-7. • After the neural network developed, 72 test subjects were observed and verified accuracy with real values.
Where, is the error in the kth example, is a vector with elements .
The error vector is defined by Taylor series as per Eq. (3), The error function is express as follows in the Eq. (4),  Where λ is the parameter for governing the step size. Minimizing the modified error with respect to (j+1) is described as in the Eq. (8), For a very large value of λ amount, the standard gradient descent method is used.
For very small values of λ amount, the newton method is used.

Results and Analysis
A total of 72 test subjects were obtained from the Kaggle Cardiovascular disease dataset with 11 attributes. The entire dataset was classified into four classes; each class composed of 18 test subjects and used to validate the results.
The progress is constantly being updated during the training. The performance magnitude of the performance, gradient, and the number of validation checks are of most interest during the training process. To end the instruction, the magnitude of the gradient and the amount of validity tests are used. When the preparation approaches a minimum output, the gradient can become very low. Figure 4 illustrates the Regression graph of the Training. Upon training and testing of the network, the network object may be used to measure the response to any input data. Figure 5 illustrates the Mean Square Error verses training performance of the neural network.
To grasp the actions of the classifier, we were calculated the confusion matrix metric. A confusion matrix, also known as an error matrix, is widely used in the area of machine learning and especially the issue of statistical classification. A confusion matrix is a table often used to describe a performance of a classification model or classifier on a set of test data for which the true values are known.
Confusion Matrix validates the performance of an algorithm. The entries in the confusion metrics were calculated from the coincidence matrix by using the following hypothesis, • True Negative (TN) is the number of correct predictions that an instance is negative.
• True Positive (TP) is the number of correct predictions that an instance is positive.
• False Positive (FP) is the number of incorrect predictions that an instance is positive.
• False Negative (FN) is the number of incorrect predictions that an instance is negative.
The accuracy was calculated by using the Eq. (9), Classification Matrix shows the frequency of correct and faulty predictions (Kim et al. 2015). It compares the actual test data set values to the predicted values in the trained model.

Class Actual Predicted
Positive Negative Where is the real data (verification); is the prediction data.
The smaller the MAE, the closer the prediction data is to the real data (verification), and the larger the , the greater the difference between the predicted data and the real data (verification). The experimental findings revealed that neural network was outperforming in the area of forecasting cases of cardiovascular disease. The experiment was designed to determine Neural Network output and explore the impact of the selection of attributes on the model.  This test dataset contains a total of 72 subjects in four classes. Neural Network has proved its performance in terms of accuracy (81.62%), which makes it an excellent classifier to be used in the medical field for classification and prediction. Table 6 illustrates the accuracy and mean absolute error of each class.    The decomposed data can reflect fluctuation information on different subjects while retaining the characteristics of the original data. Figure 11 shows the degree of CVD danger, and 48.0% of topics were at low CVD risk points for both males and females. Average risk forecasting for moderate and high risk for both genders was 20.5% and 31.5% respectively.

Conclusions
In this article, we proposed a possible solution for predicting cardiovascular diseases using artificial intelligence technology. Multilayer perceptron neural networks, backpropagation (Levenberg Marquardt) training algorithm, and tansig activation function was used to develop the prediction system. The excremental results suggested that the system predict presence of cardiovascular disease with 81.62% efficiency. The accuracy of the results may increase by engaging with more attributes such as chest pain type, heredity, stress level and treatment for hypertension, etc.