Comparison of Machine Learning Models and Framingham Risk Score for the prediction of the presence and severity of Coronary Artery Diseases by using Gensini Score

Background: The risk prediction model for cardiovascular conditions based on the routine information isn’t established. Machine Learning (ML) models offered opportunities to build a promising and accurate prediction system for the presence and severity of Coronary Artery Diseases (CAD). Methods: In order to compare the validation of ML models to Framingham Risk Score (FRS), a total of 2608 inpatients (1669 men, 939 women; mean age 63.16 ± 10.72 years) at our hospital from January 2015 to July 2017 were extracted from electronic medical system with 29 attributes. Four different ML algorithms (Logistic Regression (LR), Random Forest (RF), k-Nearest Neighbors (KNN), Artificial Neural Networks (ANN)) were acted to build models, based on eight core risk factors and all factors respectively. The Area Under Curve (AUC) of receiver operating characteristic curve was the significant value to show the prediction power for different models. Results: According to the AUC, all of ML algorithms had a better prediction validation than FRS for the presence of CAD, specifically, FRS<LR<RF<KNN<ANN (FRS variables) and FRS<LR=RF=KNN<ANN (all variables). Additionally, ANN could be the best model to predict the presence of CAD (AUC 0.82, Accuracy 0.74). For the severity, only ANN (AUC 0.70, Accuracy 0.65) in the ML models could have a better prediction than FRS (AUC 0.59, Accuracy 0.62). The other three models didn’t get a better AUC than FRS. Conclusions: Compared to an established FRS prediction algorithm, we found all the ML models had a better prediction validation than FRS for the presence of CAD, moreover, ANN had a better prediction than FRS for the severity of CAD.

people died in 2013, higher than the data in 1990 (1). Although the economy and the technology are developing rapidly, the cost for coronary angiography or even for percutaneous coronary interventions is still too expensive for a median or low income family in the developing countries. Especially, the atherosclerotic burden is increasing in a dramatic pace in Indian or China with scarce medical resources, who have a large population known as more expected CVD deaths than in all developed countries added together by 2030 (2). However, on the other hand, in a large cohort study of the United States, about 12% PCI operations in non-acute indications are classified as in appropriate eventually (3). Standing on this situation, the risk model to determine a special individual whether receives the coronary angiography or not is substantially urgent and important for the clinical medicine.
In the last few decades, many risk calculators, such as Framingham Risk Score (FRS) (4) and the American College of Cardiology (ACC) /American Heart Association (AHA) 2013 model (5), have been put forward based on the population's demographics, medical conditions, and some routine laboratory results. Specifically, FRS is the first and the most accepted risk model to take eight core risk factors into count for the prediction of ten-year cardiovascular events. Moreover, almost all standard cardiovascular risk models are based on the fundamental assumption that every factor may have a linear association with the CVD outcomes, so as FRS. Such risk models may just oversimplify the complicated relationships across large numbers of risk factors' interactions.
Hopefully, Machine Learning (ML) provides a better way to cover the limitations following the traditional popular risk models. ML can be referred as a general-purpose system with a capability of reasoning and thinking skills mimicking a human being's brain (6). ML derived from the study of pattern recognition and computational learning ('artificial intelligence').
And this approach relies on a computer to exploit all complex and non-linear interactions 4 across all the attributes to build the best model for the prediction of observed outcomes (7). What's more, these ML algorithms are typically acted without making as many strict assumptions of the underlying data (8), and may identify latent variables, which are inferred from other variables indirectly.
So far, there has no large-scale study to apply ML approaches for the prediction of the presence and the severity of CAD in the general population, with some demographical factors, medical conditions, and a few routine laboratory results. The aim of this study was to explore whether ML approaches could improve the accuracy to predict the presence and the severity of CAD and also to determine whether the ML approaches are better than FRS for the prediction.

Data source
The dataset in this study included adult inpatients who were admitted in our hospital (Sir Run Run Shaw Hospital, Hangzhou, Zhejiang, China) from January 2015 to July 2017. Since 2014, electronic medical records system has been built in our hospital to document outpatients and inpatients information, including demographic details, history of medical condition, laboratory results, imaging impressions, primary diagnosis, prescription of drugs, records of interventions and surgeries, referrals to specialists, and following-up biological results. The Institutional Ethics Research Committee approved the study, and all patients provided written informed consent.

Data extraction and inclusion
The enrollment criteria as follows: 1) inpatients with a coronary artery angiography this inpatient time; 2) never had a coronary angiography before this inpatient period; 3) patients with severe valve diseases, severe heart failure, acute coronary artery syndrome, previous myocardial infarction or any other revascularization procedures, strokes were 5 excluded. A total of 2608 inpatients at our hospital from January 2015 to July 2017 were extracted from electronic medical system with 29 attributes (Figure1). These medical conditions related attributes were collected by experienced physicians and some laboratory results were recorded by trained technicians with standard automated machines. The eight core risk attributes (age, gender, total cholesterol (TC), high density lipoprotein (HDL-C), treated or not treated systolic blood pressure (SYB), anti-hypertension medications, smoking status and diabetes mellitus (DM)) were acted to calculate the risk for the ten-year cardiovascular events by using the published equation in 2008 of the globally Framingham risk model (4). Individuals with low risk have 10% or less CHD risk at 10 years, with intermediate risk 10-20%, and with high risk 20% or more (9). So we decided to choose 10% as a threshold to determine people whether had a CAD, and 20% as a threshold to determine people whether had a severe CAD.
In order to do the comparison between the traditional standard risk model and ML approaches, we have two separately steps as following: 1) ML approaches used eight core risk attributes to build models for the comparison; 2) ML took all attributes to build the models for the comparison (if the data loss of an attribute is more than 10%, this attribute will be excluded). Some variables were selected based on their inclusion in published CVD risk model (4,(10)(11)(12), and other variables were reviewed by experienced physicians (Wenbin Zhang & Guosheng Fu).

Study group design
Gensini Score (GS) is a coronary angiographic score system that quantifies the extent and severity of coronary arteries. Moreover, GS accounts for the degree of artery narrowing as 6 well as locations of narrowing (13). For presence, the population was divided into two groups (GS = 0, Negative Group; GS > 0, Positive Group). For severity, the population in the positive group was divided into four groups (0 < GS < = 20, Group1; 20 < GS < = 30, Group2; 30 < GS < = 52, Group3; 52 < GS, Group4;). And the data in the Group1 and Group4 were used to do the Receiver Operating Curve (ROC) and calculate the Area Under Curve (AUC).

Machine Learning classification techniques
In reality, the performance of ML approaches is dependent on the fundamental algorithms and is also variable from one dataset to another dataset with different characteristics of the attributes and the observed outcomes. Under this situation, in this study, four different ML algorithms(7) (Logistic Regression (LR), Random Forest (RF), k-Nearest Neighbors (KNN), Artificial Neural Networks (ANN)) were acted to build models to compare the performance to FRS model. All the algorithms are programmed and run under the Python software circumstance. Specifically, 80% of the data was used as the training cohort data for the algorithm to build a model, and 20% of the data was used as the validation cohort data.
LR is a statistical ML algorithm that classifies the data by considering outcome variables on extreme ends and tries makes a logarithmic line that distinguishes between them. RF is a classification way to work by forming multitude decision trees at training and at testing where it outputs the class that is the mode of the classes (classification). KNN is a classification style based on the k-nearest neighbor algorithm to use the data directly for classification without building a model first. ANN tries to mimic the human brain in order to model complicated task with many interconnected nodes just like neurons in the brain.
In order to evaluate our models, 4-fold cross validation method was used to check the models' validation.

Statistical analysis
All data were collected by Statistical Package for the Social Science (SPSS) for Windows, version 22 (SPSS Inc., Chicago, IL, USA). Categorical data were using the percentages to record, while continuous data were using the means ± standard deviations to record.
Demographical characteristics of the study population were analyzed by SPSS. To deal with the missing data, median imputation was the most well-known approach to be used for the random missing data (14).

Study population characteristics
According to the inclusive and exclusive criteria, our research had enrolled 2608 patients

Discussions
Compared to an established FRS prediction algorithm, we found all the ML models (LR, RF, KNN, ANN) had a better prediction validation than FRS for the presence of CAD, moreover, ANN had a better prediction than FRS for the severity of CAD.
Firstly, if more variable data could be added into the training dataset for the ML models, more accurate and individual prediction could be built for our human beings. In our results, all the 29 attributes could be divided into the categories of basic personal information, blood cells examinations, blood biochemistry tests and medical histories.
Besides the variables in our research, other special variables were proved to be predictors for the CAD prediction. For instance, in a large Chinese cohort study (15), the data show the correlation between the ABO blood groups and the severity of CAD; And about the obesity with CAD, the Waist-Hip Ratio is considered to positively related to the presence and severity of CAD (16). HDL sub-fractions (17) and micronucleus frequency and nuclear division index (18)are also proved to be the significant indicators to predict the extent and severity of CAD. In the future further study, to add up these special variables to the dataset was a promising step for the risk prediction.
Secondly, ML methods applied to predict the presence and the severity of CAD could build a more personalized and precise model than the traditional risk systems. Not only our results showed this purpose, in the last 5 years, many other original articles were done to prove this strong statement. In a United Kingdom's research (19), ML models improved the accuracy of the prediction for the 10-year risk for CVD, and the validation was better than the ACC/AHA equation for the risk prediction. In a Korean Study (20), investigators applied the Deep Belief Neural Networks, one of the ML algorithms, into the prediction of CVD, showed accuracy 83.9% and AUC 0.790. Besides CVD, in another USA's study about heart failure(21), based on the electronic medical records, the ML model had 11% improvement in AUC than the mainstream Seattle Heart Failure Model. In an Arabian investigation (22), four different ML algorithms were used to predict the length of stay in the hospital. Above all, these steps were exciting and also on the way to the individual medicine.
Thirdly, datasets were the fundamental essential for a better prediction rather than the methods, including ML algorithms. In our results, the ML models for the presence was promising, but the results for the severity didn't achieve our expectation, the performance of FRS was better than KNN, LR and RF. In almost all study we referred, ML models showed a better performance than the traditional equations (19)(20)(21)(22). Back to our study population, Group1 and Group4, the total number for two groups was 756, were the datasets for the building of ML model. Literally, the quantity and quality of the datasets could be the limiting factors for the usefulness of ML models.

Limitations
Generally, there were several limitations of this current study. Firstly, as mentioned before, the dataset was from one-single health organization instead of several different centers. What's more, because of our inclusive and exclusive criteria, these patients already had a high suspicion of CVD. Secondly, it was acknowledged that the "black box" nature of ML models could be impossible for the interpretation of ML models. Thirdly, if the data loss of an attribute reached 10%, the attribute was removed from the dataset.
This process would cause some biases before we knew the specific variable was important for the prediction or not.

Conclusions
Compared to an established FRS prediction algorithm, we found all the ML models had a better prediction validation than FRS for the presence of CAD, moreover, ANN had a better prediction than FRS for the severity of CAD.

Consent for publication
Not applicable.

Availability of data and material
The data that support the findings of this study are available from Wenbin Zhang, but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of Wenibin Zhang.

Competing interests
The authors declare that they have no competing interests.  Abbreviations are the same in the Table3.

Figures
18 Figure 1 Machine learning Standardized flowchart and patient cohort extractions