A comparative study of machine learning approaches to heart disease prediction: an empirical analysis

doi:10.21203/rs.3.rs-3098962/v1

Purpose:

This paper compares five supervised learning algorithms (support vector machines, k-nearest neighbor, decision tree, random forest, and AdaBoost) for predicting heart disease and examines the impact of normalization and GridSearch hyper-parameter tuning on model performance.

Methods:

The study utilizes the Cleveland database from the University of California-Irvine (UCI) repository, comprising data on 918 instances of heart disease patients with 12 attributes. Eleven attributes serve as predictors, while one attribute represents the target class. Models are built and tested using this dataset.

Results:

Comparing the algorithm performances with existing literature, accuracies range from 89.13–91.85%. AdaBoost exhibits the highest performance, whereas the decision tree performs the least effectively. Results surpass those reported in the literature. Normalization improves prediction performance by 17% for Support Vector Machines (SVM) and 14% for k-nearest neighbor (kNN). SVM does not benefit from GridSearch, while GridSearch enhances the decision tree and AdaBoost by 7% and 4% respectively. Normalization combined with GridSearch improves kNN and random forest by 2–3%.

Conclusion:

This study compares supervised learning algorithms for heart disease prediction. AdaBoost emerges as the top-performing algorithm, while the decision tree performs relatively poorly. The findings surpass those in the literature. Normalization significantly improves performance for SVM and kNN, while GridSearch enhances the decision tree and AdaBoost. Combined, normalization and GridSearch yield performance improvements for kNN and random forest. These results contribute to the field of heart disease prediction, offering valuable insights for algorithm selection and guiding future research.

predictive models

empirical analysis

heart disease classification

decision support system

Heart disease is recognized by WHO as one of the most significant diseases in the world [1]. It incurs high patient care costs [2], which has become a growing concern in countries with an increasing aging population. Early diagnosis and prediction of risk of heart disease followed by timely intervention can play an important role in disease management. Identification of disease is complex due to the many factors involved. Unidentified cases may result in premature deaths. Over the last decade, as large data sets have become available, Machine Learning (ML) techniques have been used on health data, including heart disease, for prediction and classification and have given promising results. Research on ML for prediction of heart disease involves building a model [3], and using the model to determine the risk of developing or having heart disease [4].

ML techniques have been shown to perform well in classification problems with large data sets and many predictive factors. ML algorithms used for diagnosing and predicting a heart disease include conventional algorithms including Support Vector Machine (SVM), K-Nearest Neighbor (kNN), Decision Tree (DT), and Linear Regression (LR), ensemble algorithms including Random Forest (RF), Bagging, Adaptive Boosting (AdaBoost), and deep learning algorithm including Convolutional Neural Network (CNN) [5–11].

Use of an appropriate dataset and using pre-processing techniques such as Exploratory Data Analysis (EDA) to exclude irrelevant data can improve the performance of machine learning algorithms [5, 6]. The performance of each algorithm may vary depending on the dataset used, the parameters applied, and the pre-processing technique performed before creating a model [7].

Senan et al. performed prediction of heart disease using five machine learning algorithms and used the synthetic minority oversampling (SMOTE) technique to resolve an imbalance problem in the dataset [8]. The SelectKBest function was used with the chi-squared statistical method to determine the most important features. The study had accuracy for the testing set of 90.16% for SVM, 90.16% for KNN, 81.97% for DT, 85.25% for RF, and 88.52% for LR. Shah et al. conducted a study on prediction of heart disease using four machine learning algorithms, Naive Bayes, k-NN, DT, and RF [9], using default parameters. The study had accuracy scores of 88.16%, 90.79%, 80.26%, and 86.84%, respectively [9]. Reddy et al. conducted a study on prediction of heart disease by applying ten algorithms with three attribute evaluators (correlation-based feature selection, chi-squared attribute evaluation, and ReliefF attribute evaluation) [10]. This study showed that attribute evaluators could improve model performance. Some classifiers showed performance improvement after the attribute classifier was applied. Sequential Minimal Optimization (SMO) classifier using Chi-Squared attribute evaluator was found to have best performance, with an accuracy of 86.47%. The study by Arooj et al. showed that the deep learning algorithm could be implemented to predict heart disease [11]. In this study, the model with the CNN algorithm resulted in an accuracy of 91.7%.

Several studies have performed pre-processing prior to building models [8, 10], however the improvement in performance was not significant when compared to studies without pre-processing [9, 11]. Pre-processing steps on the dataset, including normalization, are expected to improve the performance of the machine learning model. Singh et al. studied the impact of normalizing data before building a model [12]. They showed that normalized data performed better than un-normalized data. Jo normalized the dataset using Min-Max normalization and concluded that normalization improves the accuracy performance of models created by the Support Vector Machine (SVM) algorithm [13].

Determining relevant parameter values also affects the performance of models in machine learning. Building models based on tuned multi parameters will produce an optimal performance [14]. Fuadah et al. implemented a hyper-parameter tuning technique using grid search on Heart Sound classification to obtain optimal classification [15]. Using the best parameters selected by the grid search method when building the machine learning model resulted in greater accuracy than other studies [15].

This study conducts an empirical analysis on the performance of machine learning algorithms to predict heart disease, comparing classical machine learning algorithms DT, SVM, and k-NN, and the ensemble algorithms RF and AdaBoost. It investigates the effectiveness of using normalization and selection of hyper-parameters.

The heart disease dataset from UCI has been used in many ML studies, including those referenced in this paper and is used in this study. Generally, most studies restrict use to the data provided by Cleveland Clinic, which allows comparison of outcome.

2.1. Dataset

This study uses the UCI dataset (University of California-Irvine) for heart disease from the Kaggle website [16]. The dataset consists of 1190 observations, although 272 are duplicate observations, giving a dataset of 918 observations [17]. Two of the 14 attributes (ca and thal) are missing in more than 50% of the patients, and have been excluded from this study. Twelve attributes are used (Age, Sex, ChestPainType, RestingBP, Cholesterol, FastingBS, RestingECG, MaxHR, ExerciseAngina, Oldpeak, ST_Slope, and HeartDisease); the first eleven are predictors and the last is the target class. The attributes and their description, range of values, and type are given in Table 1.

Table 1

Attribute Information of the Dataset
Attributes	Description	Value	Type
Age	age of the patient in years	28–77	integer
Sex	sex of the patient	M: Male F: Female	categorical
ChestPainType	chest pain type	TA: Typical Angina ATA: Atypical Angina NAP: Non-Anginal Pain ASY: Asymptomatic	categorical
RestingBP	resting blood pressure in mm Hg	80–200	integer
Cholesterol	serum cholesterol in mm/dl	0–603	integer
FastingBS	fasting blood sugar	1: if FastingBS > 120 mg/dl 0: otherwise	integer
RestingECG	resting electrocardiogram results	Normal: Normal ST: having ST-T wave abnormality LVH: showing probable or definite left ventricular hypertrophy	categorical
MaxHR	maximum heart rate achieved	60–202	integer
ExerciseAngina	exercise-induced angina	Y: Yes N: No	boolean
Oldpeak	numeric value measured in depression	-2.6–6.2	float
ST_Slope	the slope of the peak exercise ST segment	Up: upsloping Flat: flat Down: downsloping	categorical
HeartDisease	output class	1: heart disease 0: Normal	integer

2.2. Dataset Normalization

The UCI dataset was divided into a training and test set, with 80% and 20% patients respectively. Min-max normalization of each feature was then performed on the two data sets to scale each data feature to a specific range (by specifying the minimum and maximum of the range). Usually normalization is made to the range of 0 to 1, where the minimum is mapped to zero and the maximum to one [18]. The normalization does not change the original distribution, or the relationship between features. The normalization (also known as scaling) is as in Eq. 1:

$${A}^{{\prime }}=\left(\frac{A-\text{min}\left(A\right)}{\text{max}\left(A\right)-\text{min}\left(A\right)}\right)*\left({R}_{max}-{R}_{min}\right)+{R}_{min}$$

1

Where,

A’ is the normalized values of the data feature A, and ${R}_{max}$ and ${R}_{min}$ are the maximum and the minimum values, respectively, of the range to which the feature values are to be mapped.

2.3. Model Building

This study builds models from the dataset using two groups of machine learning algorithms; a traditional algorithm group, and an ensemble algorithm group, and compares performance. The traditional algorithms used are Decision Tree, Support Vector Machine, and k-Nearest Neighbor. The ensemble algorithms used are Random Forest and Adaptive Boosting. Models with and without normalization, and models with default parameters and with hyper-parameter selection using the grid search method were built. The approach to model building is shown Fig. 1 and allows comparison of the performance of models with and without normalization and with and without hyper-parameter tuning.

2.3.1 Algorithms Used

Five supervised machine learning algorithms were used; Decision Tree, Support Vector Machine, k-Nearest Neighbors, Random Forest and Adaptive Boosting.

Decision Tree

The Decision Tree (DT) algorithm performs classification or regression by creating a tree structure [19]. The Decision Tree has three primary nodes: root, internal, and leaf. Classification rules are applied at internal nodes along the root-to-leaf paths [20]. Depending on the complexity of the problem, the structure of the tree grows or shrinks. The root node is the first node of the tree where all the nodes start. The internal nodes perform decision rules (tests) on attributes resulting in branches leading to other internal nodes or leaf nodes. The leaves of the tree are the final nodes and define the final classification result of the tree.

Support Vector Machine

The Support Vector Machine (SVM) algorithm uses hyperplanes as the decision boundary for classification [21]. A hyperplane is selected in a way that maximizes the distances between the hyperplane and support vectors. The support vectors are the data points close to the hyperplane. In cases where the data are not linearly separable, the data are projected into higher dimensions until they become separable. Kernel transformations are used for this purpose. Common kernels are Linear, Polynomial, Radial Basis Function (RBF), and Sigmoid.

K-Nearest Neighbor

The K-Nearest Neighbor (KNN) algorithm classifies an instance into the class that is the most common class among the k nearest neighbors [22]. Euclidian distances to other data instances are usually used to identify the k-nearest neighbors. This paper uses the Minkowski distance, which contains the Euclidean and Manhattan distances as special cases. For an n-dimensional space, the Minkowski distance of order p between $x=\left({x}_{1},{x}_{2},\dots ,{x}_{n}\right)$ and $y=\left({y}_{1},{y}_{2},\dots , {y}_{n}\right)$ instances is calculated as Eq. 2:

$$d={\left(\sum _{i=1}^{n}{\left|{x}_{i}-{y}_{i}\right|}^{p}\right)}^{1/p}$$

2

where p is an integer. The Minkowski distance equals to the Euclidean distance for p = 2, and the Manhattan distance for p = 1.

Random Forest

Random Forest is an ensemble learning algorithm. It builds and trains multiple decision trees, and combines the predictions made by the trees to obtain a more accurate and stable prediction [23]. It is commonly used to solve classification and regression problems. For classification problems, the final decision is the most common class across the forest. For regression problems, the final prediction is the average of the predictions made by the individual classifiers. Random Forest alleviates the overfitting problem associated with DT by randomly creating subset features and building an uncorrelated forest of decision trees.

Adaptive Boosting

Adaptive Boosting, or AdaBoost, is an ensemble learning algorithm utilizing a sequence of learners to improve prediction accuracy [24]. The AdaBoost algorithm is similar to Random Forest, and builds multiple Decision Trees. However AdaBoost differs from Random Forest by using stumps, which use one attribute for splitting, i.e. the root has only one branch leading to leaves. The algorithm starts building a model with equal weights given to all data points. It then assigns greater weights to points that are misclassified. The algorithm carries on giving greater weights to misclassified data points in the previous model in the sequence until the specified number of iterations is completed or until a specified error is achieved.

2.3.2 Hyper-parameter Selection - Grid Search

The GridSearch and Cross Validation (GridSearchCV or simply GridSearch) hyper-parameter selection was investigated in order to determine the impact of its use on performance [25]. GridSearch optimizes the model by considering all hyper-parameter combinations and finding the optimum combination for each algorithm. The hyper-parameters used for each algorithm are shown in Table 2. For example, kNN uses four hyper-parameters, including number of neighbors, power p for Minkowski distance, weight, and the algorithm to calculate the nearest neighbors. The optimum combination of the parameters for normalized data is n_neighbors = 17, p = 1, weights = distance and algorithm = auto.

2.4. Performance Metrics

The performance of each model was evaluated using five performance metrics; accuracy, precision, f1-score, recall, and AUC-score. These parameters are derived from the confusion matrix as shown in Table 3, where TP is true positive, TN is true negative, and represent correct estimations; FP is false positive (false alarm) and FN is false negative (missed estimation), and represent incorrect estimations. Precision is defined as $TP/(TP+FP)$, and measures the ratio positive estimates that are actually positive. Recall measures the ratio of true positive estimates to actual positives as $TP/(TP+FN)$. AUC is area under ROC curve, a plot of recall versus false positive rate, $FPR=FP/(FP+TN)$. Accuracy measures the ratio of all correctly classified instances to all instances, as $(TP+TN)/(TP+TN+FP+FN)$. The F1 score is a function of precision and recall, as given in Eq. 3:

$$F1=2\times \frac{Precision\times Recall}{Precision+Recall}$$

Table 3

Confusion Matrix
True Class	Estimated Class
True Class	Positive	Negative
Positive	TP	FN
Negative	FP	TN

This section compares the performance of each model for the four combinations of normalization and hyper-parameter selection for each of performance metrics, with results given in Table 4.

3.1. Performance Comparison

The best outcome for each metric is shown in bold.

Table 4

Performance of the Models
Method	Performance Metrics	Non-normalized	Normalized	GridSearch + Non-normalized	GridSearch + Normalized
Decision Tree	Precision	82.69%	79.25%	87.16%	87.27%
	Recall	84.31%	82.35%	93.14%	94.12%
	F1	83.50%	80.77%	90.05%	90.57%
	Accuracy	81.52%	78.26%	88.59%	89.13%
	AUC-Score	81.18%	77.76%	88.03%	88.52%
SVM	Precision	78.49%	92.08%	89.32%	90.82%
	Recall	71.57%	91.18%	90.20%	87.25%
	F1	74.87%	91.63%	89.76%	89.00%
	Accuracy	73.37%	90.76%	88.59%	88.04%
	AUC-Score	73.59%	90.71%	88.39%	88.14%
kNN	Precision	75.00%	88.46%	78.90%	89.81%
	Recall	76.47%	90.20%	84.31%	95.10%
	F1	75.73%	89.32%	81.52%	92.38%
	Accuracy	72.83%	88.04%	78.80%	91.30%
	AUC-Score	78.00%	92.00%	78.13%	90.84%
RF	Precision	90.10%	87.85%	87.16%	89.81%
	Recall	89.22%	92.16%	93.14%	95.10%
	F1	89.66%	89.95%	90.05%	92.38%
	Accuracy	88.59%	88.59%	88.59%	91.30%
	AUC-Score	88.51%	88.15%	88.03%	90.84%
AdaBoost	Precision	88.35%	85.71%	91.43%	91.26%
	Recall	89.22%	94.12%	94.12%	92.16%
	F1	88.78%	89.72%	92.75%	91.71%
	Accuracy	87.50%	88.04%	91.85%	90.76%
	AUC-Score	87.29%	87.30%	91.57%	90.59%

Normalization improved model performance in general, when compared to models without normalization. Typical performance improvements were 17% for SVM and 14% for kNN, and normalization alone gave the best result for SVM. Hyper-parameter tuning with GridSearch also improved performance in general, and with normalization provides improvement for Decision Tree, kNN, and Random Forest; GridSearch alone gave a marginal improvement for AdaBoost. Only AdaBoost performed best without normalization.

We also used bar charts to visualize and compare performances for each metric (Figs. 2–5). It can be seen that hyper-parameter selections improved model performance. While feature normalization without GridSearch deteriorated the performance of decision tree, it improved the performances of other four algorithms. Figure 2 shows that the highest precision score of 92.08% is obtained for SVM with normalization, which is followed by AdaBoost with GridSearch. Figure 3 shows that the best recall score of 95.01% is obtained for Random Forest and k-Nearest Neighbor when both hyper-parameter selection and normalization are used. On the other hand, the best accuracy and f1-scores are achieved for AdaBoost with hyper-parameter selection (Figs. 4 and 5). The best AUC score is obtained for k-Nearest Neighbor with normalization (Fig. 6).

3.2. Comparison to other studies

Table 5 compares the best results for the five algorithms used in this study to studies that used the UCI heart disease data set. The accuracy metric was the most common metric used in other studies, and therefore was chosen from this study for the comparison.

The approaches used in this study generally have better outcomes than other studies. This is notable for DT, RF, and AdaBoost, with DT being 8% and RF being 5% greater than in [8, 9]. This can be attributed to the improvement given by the use of GridSearch. Normalization was used in [8, 9], and the results for SVM and kNN are only marginally better (0.5-1%). The highest accuracy in other studies is 91.7% [11], which was obtained using 1050 instances and deep convolutional neural network (CNN). This compares to the best accuracy in this study (91.85%) for AdaBoost with normalization and GridSearch. Reddy et al [10] used the ReliefF and Chi-Squared to select attributes, with the top 10 attributes being used for training the classifier. Their best performing approach, sequential minimal optimization (SMO) using optimal attribute set from chi-squared attribute evaluator, gave an accuracy of only 86.47%.

Table 5

Accuracy comparison to other studies in heart disease prediction
Author	Dataset	ML Algorithms	Accuracy (%)
Senan, Ebrahim M., et al. [8]	UCI heart disease (Cleveland)	SVM	90.16
		KNN	90.16
		DT	81.97
		RF	85.25
		LR	88.52
Shah, Devans, et al. [9]	UCI heart disease	NB	88.157
		KNN	90.789
		DT	80.263
		RF	86.84
Reddy, Karna V. V., et al. [10]	UCI heart disease (Cleveland)	NB + ReliefF	84.49
		SMO + Chi-squared	86.47
		LR	84.82
		AdaBoostM1 + LR	84.82
		Bagging + LR + Chi-Squared	85.48
Arooj, Sadia, et al. [11]	UCI heart disease	CNN	91.7
Proposed Study	UCI heart disease	DT + Normalization + GridSearchCV	89.13
		SVM + Normalization	90.76
		KNN + Normalization + GridSearchCV	91.13
		RF + Normalization + GridSearchCV	91.13
		Adaboost + GridSearchCV	91.85

Presence of heart disease was predicted by using five machine learning algorithms and 918 instances with 12 attributes (eleven predictors and a target class) from the UCI heart data set. Accuracy for the methods was between 89% and 92%. Normalization was found to improve prediction performance for SVM and kNN. SVM performed best for normalization alone, with 17% improvement in accuracy compared to non-normalized. GridSearch hyper-parameter tuning significantly improved the prediction performances of DT and Adaboost. This study shows that ML may be used for predicting heart disease. Accuracy for ML techniques can be improved by using larger data sets. The advantages of combining multiple machine learning models need to be researched. Also, the potential of ML in long-term and short-term patient management needs to be investigated.

Acknowledgements The author would like to thank the Department of Electrical and Electronics Engineering, Ondokuz Mayis University for the support in finishing this article.

Author Contributions The authors acknowledge that they contributed to the study as follows: Q.A.H. and H.G.K. prepared and performed the concept and design of the study; Q.A.H prepared and performed the program code of the study; Q.A.H and H.G.K analyzed and interpreted the results; Q.A.H., H.G.K., and G.Y.T. prepared and wrote the main manuscript; H.G.K., G.Y.T., and I.S. edited the main manuscript before submitting. All author reviewed and approved the final version of the manuscript.

Funding This research received no specific grant from any funding agencies.

Data availability The data of this study are available from the corresponding authors upon reasonable request.

Conflict of interest The authors have no conflict of interest to disclose.

World Health Organization. (n.d.). Cardiovascular diseases. World Health Organization. Retrieved January 10, 2023 from https://www.who.int/health-topics/cardiovascular-diseases
Tarride, J. E., Lim, M., DesMeules, M., Luo, W., Burke, N., O’Reilly, D., Bowen, J., & Goeree, R. (2009). A review of the cost of cardiovascular disease. Canadian Journal of Cardiology, 25(6), e195-e202. https://doi.org/10.1016/S0828-282X(09)70098-4
Mohan, S., Thirumalai, C., & Srivastava, G. (2019). Effective heart disease prediction using hybrid machine learning techniques. IEEE access, 7, 81542-81554. https://doi.org/10.1109/ACCESS.2019.2923707
Guidi, G., Pettenati, M. C., Melillo, P., & Iadanza, E. (2014). A machine learning system to improve heart failure patient assistance. IEEE journal of biomedical and health informatics, 18(6), 1750-1756. https://doi.org/10.1109/JBHI.2014.2337752
Adler, E. D., Voors, A. A., Klein, L., Macheret, F., Braun, O. O., Urey, M. A., Zhu, W., Sama, I., Tadel, M., Campagnari, C., Greenberg, B., & Yagil, A. (2020). Improving risk prediction in heart failure using machine learning. European journal of heart failure, 22(1), 139-147. https://doi.org/10.1002/ejhf.1628
Plati, D. K., Tripoliti, E. E., Bechlioulis, A., Rammos, A., Dimou, I., Lakkas, L., Watson, C., McDonald, K., Ledwidge, M., Pharithi, R., Gallagher, J., Michalis, L. K., Goletsis, Y., Naka, K. K., & Fotiadis, D. I. (2021). A Machine Learning Approach for Chronic Heart Failure Diagnosis. Diagnostics, 11(10), 1863. https://doi.org/10.3390/diagnostics11101863
Ketu, S., & Mishra, P. K. (2022). Empirical analysis of machine learning algorithms on imbalance electrocardiogram based arrhythmia dataset for heart disease detection. Arabian Journal for Science and Engineering, 1-23. https://doi.org/10.1007/s13369-021-05972-2
Senan, E. M., Abunadi, I., Jadhav, M. E., & Fati, S. M. (2021). Score and Correlation Coefficient-Based Feature Selection for Predicting Heart Failure Diagnosis by Using Machine Learning Algorithms. Computational and Mathematical Methods in Medicine, 2021. https://doi.org/10.1155/2021/8500314
Shah, D., Patel, S., & Bharti, S. K. (2020). Heart disease prediction using machine learning techniques. SN Computer Science, 1(6), 1-6. https://doi.org/10.1007/s42979-020-00365-y
Reddy, K. V. V., Elamvazuthi, I., Aziz, A. A., Paramasivam, S., Chua, H. N., & Pranavanand, S. (2021). Heart disease risk prediction using machine learning classifiers with attribute evaluators. Applied Sciences, 11(18), 8352. https://doi.org/10.3390/app11188352
Arooj, S., Rehman, S. U., Imran, A., Almuhaimeed, A., Alzahrani, A. K., & Alzahrani, A. (2022). A Deep Convolutional Neural Network for the Early Detection of Heart Disease. Biomedicines, 10(11), 2796. https://doi.org/10.3390/biomedicines10112796
Singh, D., & Singh, B. (2020). Investigating the impact of data normalization on classification performance. Applied Soft Computing, 97, 105524. https://doi.org/10.1016/j.asoc.2019.105524
Jo, J. M. (2019). Effectiveness of normalization pre-processing of big data to the machine learning performance. The Journal of the Korea institute of electronic communication sciences, 14(3), 547-552. http://dx.doi.org/10.13067/JKIECS.2019.14.3.547
Elgeldawi, E., Sayed, A., Galal, A. R., & Zaki, A. M. (2021). Hyper-parameter tuning for machine learning algorithms used for arabic sentiment analysis. Informatics, 8(4), 79. https://doi.org/10.3390/informatics8040079
Fuadah, Y. N., Pramudito, M. A., & Lim, K. M. (2022). An Optimal Approach for Heart Sound Classification Using Grid Search in Hyper-parameter Optimization of Machine Learning. Bioengineering, 10(1), 45. https://doi.org/10.3390/bioengineering10010045
Aha, D., & Kibler, D. (1988). Instance-based prediction of heart-disease presence with the Cleveland database. Irvine: University of California, 3(1), 3-2.
fedesoriano. (2021, September). Heart Failure Prediction Dataset. Retrieved October 22, 2022 from https://www.kaggle.com/fedesoriano/heart-failure-prediction.
Patro, S., & Sahu, K. K. (2015). Normalization: A preprocessing stage. arXiv preprint, arXiv:1503.06462. https://doi.org/10.48550/arXiv.1503.06462
Myles, A. J., Feudale, R. N., Liu, Y., Woody, N. A., & Brown, S. D. (2004). An introduction to decision tree modeling. Journal of Chemometrics: A Journal of the Chemometrics Society, 18(6), 275-285. https://doi.org/10.1002/cem.873
Song, Y. Y., & Ying, L. U. (2015). Decision tree methods: applications for classification and prediction. Shanghai archives of psychiatry, 27(2), 130. https://doi.org/10.11919/j.issn.1002-0829.215044
Hearst, M. A., Dumais, S. T., Osuna, E., Platt, J., & Scholkopf, B. (1998). Support vector machines. IEEE Intelligent Systems and their applications, 13(4), 18-28. https://doi.org/10.1109/5254.708428
Liao, Y., & Vemuri, V. R. (2002). Use of k-nearest neighbor classifier for intrusion detection. Computers & Security, 21(5), 439-448. https://doi.org/10.1016/S0167-4048(02)00514-X
Breiman, L. (2001). Random forests. Machine learning, 45, 5-32. https://doi.org/10.1023/A:1010933404324
Ying, C., Qi-Guang, M., Jia-Chen, L., & Lin, G. (2013). Advance and prospects of AdaBoost algorithm. Acta Automatica Sinica, 39(6), 745-758. https://doi.org/10.1016/S1874-1029(13)60052-X
Liashchynskyi, P., & Liashchynskyi, P. (2019). Grid search, random search, genetic algorithm: a big comparison for NAS. arXiv preprint, arXiv:1912.06059. https://doi.org/10.48550/arXiv.1912.06059

No competing interests reported.

A comparative study of machine learning approaches to heart disease prediction: an empirical analysis

Status:

Version 1

Abstract

Purpose:

Methods:

Results:

Conclusion:

Figures

1. Introduction

2. Methodology

2.1. Dataset

2.2. Dataset Normalization

2.3. Model Building

2.3.1 Algorithms Used

2.3.2 Hyper-parameter Selection - Grid Search

2.4. Performance Metrics

3. Result and Discussion

3.1. Performance Comparison

3.2. Comparison to other studies

4. Conclusion

Declarations

References

Additional Declarations

Status:

Version 1