The study aims to classify and predict dengue outbreaks using a dataset and a predictive model. The dataset is analyzed to determine the key parameters that are important for predicting dengue. The pulse rate and bleeding are found to be the most significant elements and their variance has a significant impact on doctors' decision making and patient health outcomes. These parameters were chosen as the focus of this study due to the significant impact they have on dengue diagnosis and treatment, as shown in Fig. 2. The results of this study highlight the importance of monitoring pulse rate, bleeding, and platelet count in dengue patients to help diagnose and treat the disease effectively. This information can be used to develop more accurate dengue diagnostic models that are essential for controlling and preventing dengue outbreaks in the future.
The decision tree algorithm used to classify and predict the type of dengue fever patients are suffering from using a dataset. By analyzing the dataset, the study found that pulse rate, bleeding, and platelet count were the most significant parameters in predicting dengue fever. The Fig. 3, decision tree generated a tree plot that helped to understand the relationship between different parameters and the dengue fever type. The tree plot showed that a patient's pulse, bleeding, chest, and ultrasound results were important in determining the type of dengue fever. The decision tree algorithm is known for its interpretability, making it easy to understand the relationship between different parameters and the dengue fever type and to detect any errors.
The study conducted multiple iterations of RF with varying number of trees generated and the Fig. 4 illustrated results that as the number of trees increased, the out of bag error decreased. It was observed that when the number of trees surpassed 50, the error rate became consistent, remaining between 0.01% and 0.05%. The highest out of bag error observed during the experiment was 0.28%. The study concludes that random forest is an efficient algorithm for dengue classification with an accuracy of over 99%.
The following Table 2, 3, and 4, compare the performance of three different algorithms (Decision Tree, Random Forest, and Naive Bayes) in classifying and predicting dengue fever types. The performance of the algorithms was measured using a Confusion matrix, which shows the number of instances that have been correctly or incorrectly classified.
The Decision Tree algorithm had the highest accuracy at 99%, with 100% precision and recall for class 0, 99% precision and 96% recall for class 1, 97% precision and 100% recall for class 2, and 100% precision and recall for class 3.
The Random Forest algorithm had a slightly lower accuracy at 99.75%, with 100% precision and recall for classes 0, 1, and 3, and 96% precision and recall for class 2.
The Naive Bayes algorithm had the lowest accuracy at 69.25%, with 97.9% precision and 94% recall for class 0, 96.3% precision and 35.6% recall for class 1, 65.78% precision and 100% recall for class 2, and 100% precision and 98.95% recall for class 3.
Table 2
Decision Tree Confusion Matrix
| 0 | 1 | 2 | 3 | Precision |
0 | 100 | 0 | 0 | 0 | 100% |
1 | 1 | 72 | 0 | 0 | 99% |
2 | 0 | 3 | 95 | 0 | 97% |
3 | 0 | 0 | 0 | 98 | 100% |
Recall | 99% | 96% | 100% | 100% | |
Table 3
Random Forest Confusion matrix
| 0 | 1 | 2 | 3 | Classification Error | Precision |
0 | 100 | 0 | 0 | 0 | 0 | 100 |
1 | 0 | 73 | 0 | 0 | 0 | 100 |
2 | 0 | 3 | 95 | 0 | 0.031 | 96 |
3 | 0 | 0 | 0 | 98 | 0 | 100 |
Recall | 100% | 100% | 96% | 100% | | |
Table 4
Confusion Matrix for the Estimation Sample Using NB
| 0 | 1 | 2 | 3 | Precision |
0 | 94 | 2 | 0 | 0 | 97.90% |
1 | 0 | 26 | 0 | 1 | 96.30% |
2 | 6 | 45 | 98 | 3 | 65.78% |
3 | 0 | 0 | 0 | 94 | 100% |
Recall | 94% | 35.60% | 100% | 98.95% | |
The individual performance of algorithms, suggests that the Decision Tree and Random Forest algorithms have the highest accuracy in classifying and predicting dengue fever types, while the Naive Bayes algorithm has a lower accuracy. Additionally, in terms of precision, Naive Bayes performed better in classifying dengue type 0, while Decision Tree performed better in classifying dengue type 0 and 3 in terms of recall.
Focusing the Table 5, shows the performance comparison of the Decision Tree (DT), Random Forest (RF), Naive Bayes (NB) and Boosted Combo Model. It has been obtained by running 10 iterations of each algorithm on the same dataset. The Accuracy percentage of each algorithm is presented as the values in the respective columns.
Table 5
Performance comparison of DT, RF, NB, combo Model
Iterations | Decision Tree | Random Forest | Naïve Bayes | Boosted Combo |
1 | 82% | 92% | 81% | 92% |
2 | 96% | 90% | 86% | 96% |
3 | 85% | 97% | 87% | 97% |
4 | 89% | 95% | 80% | 95% |
5 | 97% | 98% | 75% | 98% |
6 | 95% | 98% | 77% | 98% |
7 | 87% | 91% | 76% | 91% |
8 | 93% | 95% | 83% | 95% |
9 | 97% | 90% | 83% | 97% |
10 | 89% | 97% | 78% | 97% |
Min | 82% | 90% | 75% | 91% |
Max | 97% | 98% | 87% | 98% |
AVG | 91% | 94.3% | 80.6% | 95.6% |
Table 5 shows the performance comparison of four algorithms (Decision Tree, Random Forest, Naive Bayes, and Boosted Combo Model) on a dataset. The table shows the average accuracy of each algorithm, obtained by running 10 iterations of each algorithm on the same dataset. The results show that Random Forest has the highest average accuracy of 94.3%, followed by Decision Tree with 91% and Naive Bayes with 80.6%. The Boosted Combo Model, which combines all three algorithms, has the highest average accuracy of 95.6%. The table also shows the minimum and maximum accuracy of each algorithm, which gives an idea of the range of values that each algorithm can achieve. The results suggest that Random Forest and Boosted Combo Model are the best performing algorithms, however, it's important to consider the relatively small sample size when interpreting the results. Further study should be made with a larger sample size to increase the robustness and generalizability of the results.