Comparison of Machine Learning Models and Framingham Risk Score for the prediction of the presence and severity of Coronary Artery Diseases by using Gensini Score

doi:10.21203/rs.2.12128/v1

Download PDF

Research article

Comparison of Machine Learning Models and Framingham Risk Score for the prediction of the presence and severity of Coronary Artery Diseases by using Gensini Score

https://doi.org/10.21203/rs.2.12128/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Background: The risk prediction model for cardiovascular conditions based on the routine information isn’t established. Machine Learning (ML) models offered opportunities to build a promising and accurate prediction system for the presence and severity of Coronary Artery Diseases (CAD). Methods: In order to compare the validation of ML models to Framingham Risk Score (FRS), a total of 2608 inpatients (1669 men, 939 women; mean age 63.16 ± 10.72 years) at our hospital from January 2015 to July 2017 were extracted from electronic medical system with 29 attributes. Four different ML algorithms (Logistic Regression (LR), Random Forest (RF), k-Nearest Neighbors (KNN), Artificial Neural Networks (ANN)) were acted to build models, based on eight core risk factors and all factors respectively. The Area Under Curve (AUC) of receiver operating characteristic curve was the significant value to show the prediction power for different models. Results: According to the AUC, all of ML algorithms had a better prediction validation than FRS for the presence of CAD, specifically, FRS<LR<RF<KNN<ANN (FRS variables) and FRS<LR=RF=KNN<ANN (all variables). Additionally, ANN could be the best model to predict the presence of CAD (AUC 0.82, Accuracy 0.74). For the severity, only ANN (AUC 0.70, Accuracy 0.65) in the ML models could have a better prediction than FRS (AUC 0.59, Accuracy 0.62). The other three models didn’t get a better AUC than FRS. Conclusions: Compared to an established FRS prediction algorithm, we found all the ML models had a better prediction validation than FRS for the presence of CAD, moreover, ANN had a better prediction than FRS for the severity of CAD.

Cardiac & Cardiovascular Systems

Computational Biology

Machine learning

Framingham Risk Score

Coronary Artery Disease

Gensini Score

Coronary artery diseases (CAD), a kind of cardiovascular disease (CVD), are considered to be the leading causes of mortality and morbidity all over the world, about 17.3 million people died in 2013, higher than the data in 1990(1). Although the economy and the technology are developing rapidly, the cost for coronary angiography or even for percutaneous coronary interventions is still too expensive for a median or low income family in the developing countries. Especially, the atherosclerotic burden is increasing in a dramatic pace in Indian or China with scarce medical resources, who have a large population known as more expected CVD deaths than in all developed countries added together by 2030(2). However, on the other hand, in a large cohort study of the United States, about 12% PCI operations in non-acute indications are classified as in appropriate eventually(3). Standing on this situation, the risk model to determine a special individual whether receives the coronary angiography or not is substantially urgent and important for the clinical medicine.

In the last few decades, many risk calculators, such as Framingham Risk Score (FRS)(4) and the American College of Cardiology (ACC) /American Heart Association (AHA) 2013 model(5), have been put forward based on the population’s demographics, medical conditions, and some routine laboratory results. Specifically, FRS is the first and the most accepted risk model to take eight core risk factors into count for the prediction of ten-year cardiovascular events. Moreover, almost all standard cardiovascular risk models are based on the fundamental assumption that every factor may have a linear association with the CVD outcomes, so as FRS. Such risk models may just oversimplify the complicated relationships across large numbers of risk factors’ interactions.

Hopefully, Machine Learning (ML) provides a better way to cover the limitations following the traditional popular risk models. ML can be referred as a general-purpose system with a capability of reasoning and thinking skills mimicking a human being’s brain(6). ML derived from the study of pattern recognition and computational learning (‘artificial intelligence’). And this approach relies on a computer to exploit all complex and non-linear interactions across all the attributes to build the best model for the prediction of observed outcomes(7). What’s more, these ML algorithms are typically acted without making as many strict assumptions of the underlying data(8), and may identify latent variables, which are inferred from other variables indirectly.

So far, there has no large-scale study to apply ML approaches for the prediction of the presence and the severity of CAD in the general population, with some demographical factors, medical conditions, and a few routine laboratory results. The aim of this study was to explore whether ML approaches could improve the accuracy to predict the presence and the severity of CAD and also to determine whether the ML approaches are better than FRS for the prediction.

Data source

The dataset in this study included adult inpatients who were admitted in our hospital (Sir Run Run Shaw Hospital, Hangzhou, Zhejiang, China) from January 2015 to July 2017. Since 2014, electronic medical records system has been built in our hospital to document outpatients and inpatients information, including demographic details, history of medical condition, laboratory results, imaging impressions, primary diagnosis, prescription of drugs, records of interventions and surgeries, referrals to specialists, and following-up biological results. The Institutional Ethics Research Committee approved the study, and all patients provided written informed consent.

Data extraction and inclusion

The enrollment criteria as follows: 1) inpatients with a coronary artery angiography this inpatient time; 2) never had a coronary angiography before this inpatient period; 3) patients with severe valve diseases, severe heart failure, acute coronary artery syndrome, previous myocardial infarction or any other revascularization procedures, strokes were excluded. A total of 2608 inpatients at our hospital from January 2015 to July 2017 were extracted from electronic medical system with 29 attributes (Figure1). These medical conditions related attributes were collected by experienced physicians and some laboratory results were recorded by trained technicians with standard automated machines.

Figure1. Machine learning Standardized flowchart and patient cohort extractions

Risk factor variables

The eight core risk attributes (age, gender, total cholesterol (TC), high density lipoprotein (HDL-C), treated or not treated systolic blood pressure (SYB), anti-hypertension medications, smoking status and diabetes mellitus (DM)) were acted to calculate the risk for the ten-year cardiovascular events by using the published equation in 2008 of the globally Framingham risk model(4). Individuals with low risk have 10% or less CHD risk at 10 years, with intermediate risk 10–20%, and with high risk 20% or more(9). So we decided to choose 10% as a threshold to determine people whether had a CAD, and 20% as a threshold to determine people whether had a severe CAD.

In order to do the comparison between the traditional standard risk model and ML approaches, we have two separately steps as following: 1) ML approaches used eight core risk attributes to build models for the comparison; 2) ML took all attributes to build the models for the comparison (if the data loss of an attribute is more than 10%, this attribute will be excluded). Some variables were selected based on their inclusion in published CVD risk model(4,10–12), and other variables were reviewed by experienced physicians (Wenbin Zhang & Guosheng Fu).

Study group design

Gensini Score (GS) is a coronary angiographic score system that quantifies the extent and severity of coronary arteries. Moreover, GS accounts for the degree of artery narrowing as well as locations of narrowing(13). For presence, the population was divided into two groups (GS = 0, Negative Group; GS > 0, Positive Group). For severity, the population in the positive group was divided into four groups (0 < GS < = 20, Group1; 20 < GS < = 30, Group2; 30 < GS < = 52, Group3; 52 < GS, Group4;). And the data in the Group1 and Group4 were used to do the Receiver Operating Curve (ROC) and calculate the Area Under Curve (AUC).

Machine Learning classification techniques

In reality, the performance of ML approaches is dependent on the fundamental algorithms and is also variable from one dataset to another dataset with different characteristics of the attributes and the observed outcomes. Under this situation, in this study, four different ML algorithms(7) (Logistic Regression (LR), Random Forest (RF), k-Nearest Neighbors (KNN), Artificial Neural Networks (ANN)) were acted to build models to compare the performance to FRS model. All the algorithms are programmed and run under the Python software circumstance. Specifically, 80% of the data was used as the training cohort data for the algorithm to build a model, and 20% of the data was used as the validation cohort data.

LR is a statistical ML algorithm that classifies the data by considering outcome variables on extreme ends and tries makes a logarithmic line that distinguishes between them. RF is a classification way to work by forming multitude decision trees at training and at testing where it outputs the class that is the mode of the classes (classification). KNN is a classification style based on the k-nearest neighbor algorithm to use the data directly for classification without building a model first. ANN tries to mimic the human brain in order to model complicated task with many interconnected nodes just like neurons in the brain. In order to evaluate our models, 4-fold cross validation method was used to check the models’ validation.

Statistical analysis

All data were collected by Statistical Package for the Social Science (SPSS) for Windows, version 22 (SPSS Inc., Chicago, IL, USA). Categorical data were using the percentages to record, while continuous data were using the means ± standard deviations to record. Demographical characteristics of the study population were analyzed by SPSS. To deal with the missing data, median imputation was the most well-known approach to be used for the random missing data(14).

Study population characteristics

According to the inclusive and exclusive criteria, our research had enrolled 2608 patients (1669 men, 939 women; mean age 63.16 ± 10.72 years) in total. The median of the GS was 30 in the patients with CAD. All the attributes for each group (Negative, Group1, Group2, Group3, Group4, Positive) were demonstrated in Table1. About eight core risk factors, only TC wasn’t significant statistically (P = 0.64) between Negative and Positive groups.

Machine learning variable rankings

All attributes listed in the Table1 were inputs for the ML algorithms to build models for the comparison, 2086 patients for the training and 522 patients for the testing. Variable rankings were determined by the absolute coefficient effect size in the LR algorithm, and weighting of the variable in the ANN algorithm. The foremost 8 variables were put in the Table2, respectively, for the presence and for the severity. For the presence, Age and Hypertreatment variables, which were included in the FRS model, were on the top 8 factors on the process to build the LR model. HDL and Hypertreatment variables were on the list to build the ANN model. For the severity, Smoking was the only same factor as FRS on the list in the LR model. And HDL and systolic BP variables were included in the ANN model.

Prediction accuracy

To provide a comprehensive evaluation, we examined the diagnostic accuracy by using 4-fold cross validation. The performance of different ML models for the presence of CAD was shown in the Table3. AUC was the most important value for a model to do a prediction. According to the AUC, all of ML algorithms had a better prediction validation than FRS, specifically, FRS<LR<RF<KNN<ANN (FRS parameters) and FRS<LR = RF = KNN<ANN (all parameters). Additionally, ANN could be the best model to predict the presence of CAD (AUC 0.82, Accuracy 0.74).

And the performance of different machine learning models for the severity of CAD was demonstrated on the Table4. According to the AUC, differently from the presence, only ANN (AUC 0.70, Accuracy 0.65) in the machine learning models could have a better prediction than FRS (AUC 0.59, Accuracy 0.62) for the severity. The other three models didn’t get a larger AUC than FRS.

Compared to an established FRS prediction algorithm, we found all the ML models (LR, RF, KNN, ANN) had a better prediction validation than FRS for the presence of CAD, moreover, ANN had a better prediction than FRS for the severity of CAD.

Firstly, if more variable data could be added into the training dataset for the ML models, more accurate and individual prediction could be built for our human beings. In our results, all the 29 attributes could be divided into the categories of basic personal information, blood cells examinations, blood biochemistry tests and medical histories. Besides the variables in our research, other special variables were proved to be predictors for the CAD prediction. For instance, in a large Chinese cohort study(15), the data show the correlation between the ABO blood groups and the severity of CAD; And about the obesity with CAD, the Waist-Hip Ratio is considered to positively related to the presence and severity of CAD(16). HDL sub-fractions(17) and micronucleus frequency and nuclear division index (18)are also proved to be the significant indicators to predict the extent and severity of CAD. In the future further study, to add up these special variables to the dataset was a promising step for the risk prediction.

Secondly, ML methods applied to predict the presence and the severity of CAD could build a more personalized and precise model than the traditional risk systems. Not only our results showed this purpose, in the last 5 years, many other original articles were done to prove this strong statement. In a United Kingdom’s research(19), ML models improved the accuracy of the prediction for the 10-year risk for CVD, and the validation was better than the ACC/AHA equation for the risk prediction. In a Korean Study(20), investigators applied the Deep Belief Neural Networks, one of the ML algorithms, into the prediction of CVD, showed accuracy 83.9% and AUC 0.790. Besides CVD, in another USA’s study about heart failure(21), based on the electronic medical records, the ML model had 11% improvement in AUC than the mainstream Seattle Heart Failure Model. In an Arabian investigation(22), four different ML algorithms were used to predict the length of stay in the hospital. Above all, these steps were exciting and also on the way to the individual medicine.

Thirdly, datasets were the fundamental essential for a better prediction rather than the methods, including ML algorithms. In our results, the ML models for the presence was promising, but the results for the severity didn’t achieve our expectation, the performance of FRS was better than KNN, LR and RF. In almost all study we referred, ML models showed a better performance than the traditional equations(19–22). Back to our study population, Group1 and Group4, the total number for two groups was 756, were the datasets for the building of ML model. Literally, the quantity and quality of the datasets could be the limiting factors for the usefulness of ML models.

Limitations

Generally, there were several limitations of this current study. Firstly, as mentioned before, the dataset was from one-single health organization instead of several different centers. What’s more, because of our inclusive and exclusive criteria, these patients already had a high suspicion of CVD. Secondly, it was acknowledged that the “black box” nature of ML models could be impossible for the interpretation of ML models. Thirdly, if the data loss of an attribute reached 10%, the attribute was removed from the dataset. This process would cause some biases before we knew the specific variable was important for the prediction or not.

Conclusions

Compared to an established FRS prediction algorithm, we found all the ML models had a better prediction validation than FRS for the presence of CAD, moreover, ANN had a better prediction than FRS for the severity of CAD.

ML: Machine Learning; FRS: Framingham Risk Score; CAD: Coronary Artery Diseases; LR: Logistic Regression; RF: Random Forest; KNN: k-Nearest Neighbors; ANN: Artificial Neural Networks; AUC: Area Under Curve; ROC: Receiver Operating Characteristic Curve; CVD: Cardiovascular Diseases; ACC: American College of Cardiology; AHA: American Heart Association; GS: Gensini Score;

Ethics approval and consent to participate

The Study was designed and implemented in accordance with the ethical principles of the Declaration of Helsinki. All patients gave their written consent to participate. This study was approved by Sir Run Run Shaw Hospital Institutional Ethics Research Committee.

Consent for publication

Not applicable.

Availability of data and material

The data that support the findings of this study are available from Wenbin Zhang, but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of Wenibin Zhang.

Competing interests

The authors declare that they have no competing interests.

Funding

This work was supported by the National Natural Science Foundation of China (81500212 and 81800212) and Zhejiang Natural Science Foundation (LY18H020007 and LQ16H020001).

Authors’ contributions

YW, KZ, GF and WZ: conception and design of the study; YL, LZ and QL: extraction of the data from electronic medical records; YW and KZ: analysis and interpretation of the data; YW, YL and WZ: writing and proofreading for the article; YW, KZ, YL, LZ, QL, GF and WZ read and approved the final manuscript.

Acknowledgements

Not applicable.

Krokstad S, Ding D, Grunseit AC, Sund ER, Holmen TL, Rangul V, et al. Global, regional, and national age–sex specific all-cause and cause-specific mortality for 240 causes of death, 1990–2013: a systematic analysis for the Global Burden of Disease Study 2013. Lancet [Internet]. 2015;385(9963):117–71. Available from: http://linkinghub.elsevier.com/retrieve/pii/S0140673614616822
Chatterji S, Chisholm D, Mathers C, Patel V, Ebrahim S, Gopalakrishna G, et al. Chronic diseases and injuries in India. Lancet [Internet]. 2011;377(9763):413–28. Available from: http://dx.doi.org/10.1016/S0140–6736(10)61188–9
Chan PS, Klein LW, Krone RJ, Dehmer GJ, Kennedy K, Nallamothu BK, et al. Appropriateness of Percutaneous Coronary Intervention. 2011;306(1):53–61.
Vasan RS, Cobain M, Kannel WB, Wolf PA, D’Agostino RB, Pencina MJ, et al. General Cardiovascular Risk Profile for Use in Primary Care. Circulation. 2008;117(6):743–53.
Guidelines P. Reply: 2013 ACC/AHA guideline on the assessment of cardiovascular risk. J Am Coll Cardiol. 2014;63(25):2886.
Seetharam K, Shrestha S, Sengupta PP. Artificial Intelligence in Cardiovascular Medicine. Curr Treat Options Cardiovasc Med. 2019;21(6):1–14.
Dreiseitl S, Ohno-Machado L. Logistic regression and artificial neural network classification models: A methodology review. J Biomed Inform. 2002;35(5–6):352–9.
Johnson KW, Torres Soto J, Glicksberg BS, Shameer K, Miotto R, Ali M, et al. Artificial Intelligence in Cardiology. J Am Coll Cardiol. 2018;71(23):2668–79.
Collins DRJ, Tompson AC, Onakpoya IJ, Roberts N, Ward AM, Heneghan CJ. Global cardiovascular risk assessment in the primary prevention of cardiovascular disease in adults: Systematic review of systematic reviews. BMJ Open. 2017;7(3).
Goff DC, Lloyd-Jones DM, Bennett G, Coady S, D’Agostino RB, Gibbons R, et al. Reply: 2013 ACC/AHA guideline on the assessment of cardiovascular risk [Internet]. Vol. 63, Journal of the American College of Cardiology. Lippincott Williams & Wilkins Hagerstown, MD; 2014 [cited 2019 Mar 7]. p. 2886. Available from: http://circ.ahajournals.org/lookup/doi/10.1161/01.cir.0000437741.48606.98
Graham I, Atar D, Borch-Johnsen K, Boysen G, Burell G, Cifkova R, et al. European guidelines on cardiovascular disease prevention in clinical practice: Executive summary - Fourth Joint Task Force of the European Society of Cardiology and Other Societies on Cardiovascular Disease Prevention in Clinical Practice (Constituted by representatives of nine societies and by invited experts). Eur Heart J. 2007;28(19):2375–414.
Deanfield J, Sattar N, Simpson I, Wood D, Bradbury K, Fox K, et al. Joint British Societies’ consensus recommendations for the prevention of cardiovascular disease (JBS3). Heart. 2014;100(SUPPL. 2).
Gensini GG. A more meaningful scoring system for determining the severity of coronary heart disease. Am J Cardiol [Internet]. 1983 Feb [cited 2019 Mar 7];51(3):606. Available from: http://www.ncbi.nlm.nih.gov/pubmed/6823874
Batista GEAPA, Monard MC. An analysis of four missing data treatment methods for supervised learning u AN ANALYSIS OF FOUR MISSING DATA TREATMENT METHODS FOR SUPERVISED LEARNING. 2010; Available from: https://www.tandfonline.com/action/journalInformation?journalCode = uaai20
Gong P, Luo SH, Li XL, Guo YL, Zhu CG, Xu RX, et al. Relation of ABO blood groups to the severity of coronary atherosclerosis: An Gensini score assessment. Atherosclerosis [Internet]. 2014;237(2):748–53. Available from: http://dx.doi.org/10.1016/j.atherosclerosis.2014.10.107
Rashiti P, Behluli I, Bytyqi AR. Assessment of the Correlation between Severity of Coronary Artery Disease and Waist–Hip Ratio. Open Access Maced J Med Sci. 2017;5(7):929–33.
Xu RX, Li S, Li XL, Zhang Y, Guo YL, Zhu CG, et al. High-density lipoprotein subfractions in relation with the severity of coronary artery disease: A Gensini score assessment. J Clin Lipidol [Internet]. 2015;9(1):26–34. Available from: http://dx.doi.org/10.1016/j.jacl.2014.11.003
Ipek E. The relationship of micronucleus frequency and nuclear division index with coronary artery disease SYNTAX and Gensini scores. Anatol J Cardiol. 2017;483–9.
Weng SF, Reps J, Kai J, Garibaldi JM, Qureshi N. Can Machine-learning improve cardiovascular risk prediction using routine clinical data? PLoS One. 2017;12(4):1–14.
Kim J, Kang U, Lee Y. Statistics and deep belief network-based cardiovascular risk prediction. Healthc Inform Res. 2017;23(3):169–75.
Panahiazar M, Taslimitehrani V, Pereira N, Pathak J. Using EHRs and Machine Learning for Heart Failure Survival Analysis. Stud Health Technol Inform. 2015;216:40–4.
Daghistani TA, Elshawi R, Sakr S, Ahmed AM, Al-Thwayee A, Al-Mallah MH. Predictors of in-hospital length of stay among cardiac patients: A machine learning approach. Int J Cardiol [Internet]. 2019;288(xxxx):140–7. Available from: https://doi.org/10.1016/j.ijcard.2019.01.046

Table1. All attributes in different groups

Attributes	Negative, N=1185	Group1, N=421	Group2, N=305	Group3, N=362	Group4, N=335	Positive, N=1423	P-value
TC, mmol/L	4.33±1.25	4.32±1.24	4.35±1.3	4.38±1.2	4.29±1.24	4.36±1.09	0.64
HDL, mmol/L	1.03±0.28	1.07±0.3	1.06±0.28	1.04±0.28	0.95±0.23	1.09±0.3	<0.01
Age, years	64.87±10.19	63.72±9.82	64.86±9.96	65.33±9.77	65.57±11.06	61.13±10.94	<0.01
systolicBP, mmHg	133.27±19.71	132.56±18.68	133.82±19.10	135.42±19.03	131.27±21.63	127.87±18.97	<0.01
diatolicBP, mmHg	74.13±12.29	74.99±12.08	74.08±11.36	74.18±13.24	73.27±12.36	74.09±12.08	0.87
BMI, kg/m^2	24.51±3.26	25.0±3.42	24.26±3.03	24.19±3.21	24.59±3.28	24.52±3.44	0.68
LDL, mmol/L	2.34±0.91	2.31±0.95	2.31±0.89	2.34±0.82	2.38±0.97	2.35±0.81	0.72
VLDL, mmol/L	0.96±1.2	0.88±0.88	0.89±0.98	1.1±1.47	0.98±1.36	0.78±0.48	0.38
TG, mmol/L	1.73±1.47	1.72±1.46	1.67±1.43	1.75±1.45	1.77±1.54	1.61±1.01	0.02
LPa, mg/dL	25.1±26.23	22.53±24.41	22.57±21.47	26.22±28.29	29.09±29.41	18.09±19.32	<0.01
UA, umol/L	366.19±93.64	369.9±91.63	355.9±82.71	361.98±96.76	377.03±101.12	356.8±97.96	0.02
TB, umol/L	13.84±6.36	13.78±5.85	14.56±6.95	13.71±6.66	13.3±5.84	15.7±8.11	<0.01
UB, umol/L	10.27±4.91	10.32±4.6	10.87±5.25	10.11±5.08	9.78±4.63	11.67±5.91	<0.01
CB, umol/L	3.58±2.0	3.45±1.73	3.76±2.2	3.6±2.29	3.52±1.7	4.03±3.03	<0.01
HCY, umol/L	14.65±7.82	14.41±6.78	14.69±8.65	14.76±8.09	14.73±7.65	13.8±7.48	0.03
FFA, umol/L	444.85±234.24	432.65±218.57	447.87±232.96	457.89±238.94	440.97±244.94	436.18±238.66	0.73
BUN, mmol/L	5.25±1.93	5.16±1.6	5.27±2.0	5.3±2.19	5.26±1.88	5.37±1.95	0.07
Cr, umol/L	81.34±50.19	77.02±21.62	82.83±64.6	82.15±60.99	83.35±41.1	75.19±36.33	<0.01
WBC, x10^9	6.59±1.98	6.35±1.77	6.34±1.87	6.72±1.94	6.96±2.24	6.27±2.21	<0.01
CRP, mg/L	5.06±12.22	4.18±10.92	5.45±13.72	5.2±13.68	5.4±10.07	4.1±10.44	0.03
Plt, x10^9	181.84±58.04	181.56±50.74	179.91±54.11	184.68±70.91	181.23±54.13	174.34±53.25	<0.01
MPV, fL	9.14±1.34	9.11±1.37	9.25±1.43	9.06±1.28	9.15±1.27	9.55±1.46	<0.01
Hemo, g/L	13.35±1.64	13.41±1.61	13.45±1.64	13.3±1.65	13.25±1.65	13.42±1.56	0.24
Drinking, yes%	33	32	36	34	31	30	0.07
DM, yes%	24	17	21	26	32	12	<0.01
Smoking, yes%	45	42	50	44	44	30	<0.01
Gender, men%	72	69	72	71	74	55	<0.01
Hyper, yes%	67	65	62	71	71	43	<0.01
Hyperteatment, yes%	82	75	79	81	91	43	<0.01

P-value: t-statistic testing between negative group and positive group, <0.05 means significant statistically.

Abbreviations: TC: Total Cholesterol; HDL: High Density Lipid Cholesterol; BP: Blood Pressure; BMI: Body Mass Index; LDL: Low Density Lipid Cholesterol; VLDL: Very Low Density Lipid Cholesterol; TG: Triglyceride; LPa: Lipid Protein alpha; UA: Uric Acid; TB: Total Bilirubin; UB: Unconjugated Bilirubin; CB: Conjugated Bilirubin; HCY: Homocysteine; FFA: Free Fatty Acid; BUN: Blood Urea Nitrogen; Cr: Creatinine; WBC: White Blood Cells; CRP: C-Reactive Protein; Plt: Platelet; MPV: Mean Platelet Volume; Hemo: hemoglobin; DM: Diabetes Mellitus; Hyper: hypertension; Hypertreatment: Hypertension treatment.

Table2. Top 8 risk factors for FRS, LR and ANN based on all attributes.

	Presence		Severity
FRS	LR	ANN	LR	ANN
Age	BUN	Hyperteatment	TB	HDL
HDL	VLDL	Hyper	diatolicBP	BMI
systolicBP	Hypertreatment	BMI	BUN	diatolicBP
Hypertreatment	MPV	CB	Hyper	systolicBP
smoking	Age	Hemo	VLDL	UB
DM	LPa	HDL	Smoking	Hemo
Gender	TB	VLDL	Drinking	Plt
TC	UB	Cr	WBC	Drinking

Abbreviations are the same in the Table1.

Table3. The performance of different machine learning models in the presence of CAD with the comparison for FRS.

		FRS variables				All variables
	FRS	KNN	LR	RF	ANN	KNN	LR	RF	ANN
Accuracy	0.65	0.72	0.67	0.71	0.71	0.72	0.73	0.73	0.74
Sensitivity	0.89	0.71	0.76	0.88	0.83	0.71	0.79	0.82	0.75
PPV	0.63	0.76	0.68	0.69	0.69	0.76	0.74	0.72	0.77
F1-score	0.74	0.73	0.72	0.77	0.76	0.73	0.76	0.77	0.76
AUC	0.62	0.72	0.66	0.7	0.74	0.72	0.72	0.72	0.82

FRS: Framingham Risk Score; KNN: k-Nearest Neighbors; LR: Logistic Regression; RF: Random Forest; ANN: Artificial Neural Networks; PPV: Positive Predictive Value; AUC: Area Under Curve.

Table4. The performance of different ML models in the presence of CAD with comparison for FRS.

		FRS variables				All variables
	FRS	KNN	LR	RF	ANN	KNN	LR	RF	ANN
Accuracy	0.59	0.6	0.61	0.59	0.62	0.57	0.59	0.57	0.65
Sensitivity	0.71	0.71	0.79	0.85	0.8	0.58	0.74	0.81	0.8
PPV	0.37	0.54	0.54	0.53	0.54	0.52	0.53	0.51	0.57
F1-score	0.48	0.61	0.64	0.65	0.61	0.55	0.62	0.63	0.65
AUC	0.62	0.61	0.63	0.62	0.67	0.57	0.61	0.6	0.7

Abbreviations are the same in the Table3.

Download PDF

Version 1

posted

You are reading this latest preprint version

Comparison of Machine Learning Models and Framingham Risk Score for the prediction of the presence and severity of Coronary Artery Diseases by using Gensini Score

Status:

Version 1

Abstract

Figures

Background

Methods

Results

Discussions

List of Abbreviations

Declarations

References

Tables

Status:

Version 1