An experiment was conducted using following techniques, such as Nearest Neighbor (KNN), and Naive Bayes (NB), Logistic Regression (LR), Support Vector Machine (SVM), Decision Tree (DT).
The aim of the study is to discover the most effective data mining methods for predicting heart disease on different dataset. An experiment was done on the heart disease dataset to find the best prediction system. Different classification techniques to see which ones provide the most accurate results for heart disease prediction. For five distinct classification algorithms, the result achieves accuracy and different metrics used to evaluate the performance.
Accuracy: The percentage of all successful predictions divide by the number of samples is used to calculate a classifier's accuracy. If the accuracy of the classifier is satisfied, it can be used on upcoming item sets for which the class label is unknown.
ROC-AUC: A ROC (Receiver Operating Characteristics) curve is a possibility curve. AUC (Area Under the Curve) is a statistical curve. ROC, on the other hand, is a measure of interpretability. It shows how the model can differentiate between classes.
Recall: The capability of a model to discover all the significant samples in a given dataset. Statistically, recall is also defined as number of correct predicted records divided by the number of correct predicted records in addition to the number of wrong predicted records.
Precision is the no of positive class forecasts that truly belong to the positive class. Mathematically, In addition to the number of false positive values, accuracy is defined as the total number of correct predicted records divided by the number of correct predicted records.
F1 score: F1-score uses the harmonic mean (HM) of a classifier's precision and recall to establish a single measure.
Methods | Accuracy (%) | ROC_CUV | Recall | Precision | F1 |
Logistic Regression | 89.18 | 0.73 | 0.06 | 0.64 | 0.11 |
SVM | 86.14 | 0.62 | 0.02 | 0.50 | 0.04 |
Nearest Neighbor | 84.41 | 0.60 | 0.08 | 0.28 | 0.12 |
Naïve Bayes | 83.96 | 0.71 | 0.18 | 0.35 | 0.23 |
Decision Tree | 77.30 | 0.56 | 0.26 | 0.27 | 0.27 |
Table.3. Performance Evolution of Framingham dataset. |
From the above table values, LR method shows the better accuracy compare to other techniques i.e., 89.18% and SVM, KNN, NB, and DT, and classification techniques are achieves the accuracy of 86.14 %, 84.41 %, 83.96 % and 77.30% respectively. SVM classifier also gives the good result after logistic regression is 86.14%, only 0.37% difference accuracy. NB also perform well after LR and SVM classifier, Decision Tree gives the least accuracy compare to other techniques i.e., 77.30%.
Methods | Accuracy (%) | ROC_CUV | Recall | Precision | F1 |
Logistic Regression | 90.00 | 0.94 | 0.79 | 0.94 | 0.86 |
Naïve Bayes | 82.00 | 0.85 | 0.82 | 0.82 | 0.82 |
SVM | 87.00 | 0.73 | 0.87 | 0.87 | 0.87 |
Nearest Neighbour | 89.00 | 0.90 | 0.88 | 0.89 | 0.88 |
Decision Tree | 77.00 | 0.76 | 0.77 | 0.77 | 0.77 |
Table.4. Performance Evolution of Cleveland dataset.
From the above table values, LR method shows the better accuracy compare to other techniques i.e., 90% and NB, DT, SVM, and, KNN and classification techniques are achieves the accuracy of 82%, 77%, 87% and 89% respectively. Nearest Neighbor classifier also gives the good result after logistic regression is 89%. Decision tree gives the least accuracy compare to other techniques i.e., 77%.
Based on the experiment results, two dataset with different attributes performs differently through classification techniques. For both dataset LT gives the better accuracy.