A number of tests and experiments were run to determine the benefits, drawbacks, and effectiveness of the HBO, as well as its position in comparison to other optimizers when applied to a number of benchmark data sets from the high dimensional domain. To make a fair comparison, all the settings were matched to ensure that all the algorithms started with the same population. All of the tests and comparisons with other metaheuristic algorithms were conducted on a personal computer with an Intel® Core™ i5-1035G1 CPU at 1.19 GHz, a memory size of 16 GB and a windows 10 Professional N64-bit operating system.
The proposed wrapper-based FS approach dependent on the KNN classifier was used to determine the performance of the upgraded HBO, which are dependent on the created feature subsets. A hold-out technique was used to validate the findings, in which each data set was divided into two parts: 80% for training purposes and 20% for testing purposes. These divisions were replicated 30 times to produce meaningful results. As a result, 30 separate runs provide the final statistical findings. MATLAB 2015 was used to produce both results and analyses.
5.3 Results
This section focuses on investigating the sensitivity of the binary variant to the initial population and to the number of iterations. Note that the HBO does not have a specific user-defined initial parameter and is characterized by its random and variable nature that changes over time. Understanding the method for obtaining the best results for the BHBO method before comparing it with other methods is an important benefit of this experiment. Such an analysis can show whether or not the population size and number of iterations has a clear impact on performance.
Table 4 shows the accuracy rates obtained by the BHBO for different sets of common parameters. From the analysis of the ranking results in Table 3, it was found that the dual HBO with 20 agents and 100 iterations had the potential to achieve the best results for the cases, i.e., the nine benchmark data sets described above. The F-test (last row in Table 3) was used to reveal the best results as this is a popular method of assessing various methods in a comparative test.
Table 4
Average Classification Accuracy achieved by BHBO with multiple sets of common parameters
Iteration | 50 | 100 | 150 |
Population | 5 | 10 | 20 | 5 | 10 | 20 | 5 | 10 | 20 |
11_Tumors | 0.689 | 0.689 | 0.701 | 0.735 | 0.758 | 0.804 | 0.643 | 0.666 | 0.747 |
14_Tumors | 0.402 | 0.435 | 0.500 | 0.480 | 0.487 | 0.564 | 0.480 | 0.428 | 0.474 |
Brain_Tumor1 | 0.888 | 0.733 | 0.644 | 0.800 | 0.711 | 0.822 | 0.777 | 0.977 | 0.911 |
Brain_Tumor2 | 0.440 | 0.520 | 0.680 | 0.720 | 0.600 | 0.720 | 0.640 | 0.720 | 0.760 |
DLBCL | 0.846 | 0.897 | 0.820 | 0.820 | 0.897 | 0.948 | 0.820 | 0.871 | 0.820 |
Leukemia1 | 0.833 | 0.750 | 0.833 | 0.833 | 0.944 | 0.916 | 0.833 | 0.833 | 0.777 |
Leukemia2 | 0.777 | 0.777 | 0.777 | 0.833 | 1.000 | 0.888 | 0.944 | 0.944 | 0.638 |
Prostate Tumor | 0.862 | 0.725 | 0.823 | 0.803 | 0.823 | 0.862 | 0.862 | 0.857 | 0.784 |
SRBCT | 0.714 | 0.857 | 0.833 | 0.833 | 0.880 | 0.904 | 0.880 | 0.738 | 0.785 |
Overall rank | 4.720 | 4.720 | 5.560 | 3.890 | 5.670 | 3.440 | 5.110 | 6.390 | 5.500 |
Table 5 shows the average time, precision, recall and f-measure results obtained by the BHBO over 30 runs.
Table 5
Average results for time, precision, recall and F-measure
Data Set | Time | Precision | Recall | F-Measure | accuracy |
11_Tumors | 322.99 | 0.96 | 1.00 | 0.99 | 0.83 |
14_Tumors | 1079.76 | 1.00 | 1.00 | 1.00 | 0.51 |
Brain_Tumor1 | 215.98 | 1.00 | 1.00 | 1.00 | 0.92 |
Brain_Tumor2 | 57.50 | 0.89 | 0.92 | 0.85 | 0.71 |
DLBCL | 43.02 | 0.94 | 0.98 | 0.96 | 0.94 |
Leukemia1 | 64.30 | 0.95 | 1.00 | 0.97 | 0.92 |
Leukemia2 | 581.40 | 1.00 | 1.00 | 1.00 | 0.95 |
Prostate_Tumor | 159.50 | 0.87 | 0.87 | 0.85 | 0.87 |
SRBCT | 276.99 | 0.93 | 0.97 | 0.94 | 0.92 |
Furthermore, a boxplot is a type of chart that shows a five-number summation. The interquartile range indicates where the data's center section is located. The first quartile (the 25% point) as well as the third quartile (the 75% mark) are situated at the box's opposite ends. The chart's minimum value is on the left, while its highest position is on the right. A vertical line in the middle of the box denotes the median. A boxplot shows how tightly the data is clustered and if it is symmetrical. Any outliers' presence and locations are also revealed. Figure 5 depicts the boxplots that depict the distribution of HBO performance across 30 runs when applied to the 9 datasets.
Figure 5 shows how the HBO was able to close the gap between of min and max accuracy values, bringing them closer to the mean value. This is a strong indicator of the HBO approach's capacity to deliver accurate findings by striking a proper balance between local and global search.
Moreover, A swarm plot is another approach to visualize the distribution of a single attribute or the combined distribution of several variables. The position of each dot on the horizontal and vertical axis indicates values for an individual data point. A swarm plot displays all of the data points, which aids in better understanding the distribution. It also aids in comprehending how data is dispersed throughout a categorical characteristic as well as how the variable varies within a category.. All the 30 run results for each dataset have been clarified as a dot in the Fig. 6.
As seen in Fig. 6, we get a good idea of how the data (the 30 runs) for each dataset is distributed in terms of a numeric attribute.
One of the goals of this study was to speed up optimization by minimizing random selection and the exploration process in order to find the best solution in a shorter duration of time. Figures 7 illustrate the success of the HBO in finding the best solution for the nine data sets using enhanced convergence. From Figs. 7, it can be seen that the BHBO achieved the best convergence trends as compared to the competing methods, as its convergence curves obtained the lowest cost values in the final iterations of the optimization process. A total of 100 iterations were executed for the BHBO because accuracy did not increase after the 80th iteration. The convergence behavior, on the other hand, can be used to predict early convergence and local minimum deviation.
5.4 Comparison with Other Methods in the Literature
The performance of the BHBO was compared with that of other well-known optimizers for various metrics such as result accuracy. The algorithms from the literature were the BHHO, the GA, and binary versions of the GSA (BGSA), the ant-lion optimizer (BALO), the bat algorithm (BBA), the salp swarm algorithm (BSSA), and the particle swarm optimizer (BPSO). The average accuracy and the number of features selected using the BHBO, as well as the fitness values for each of these methods were obtained. The results presented in the tables below were based on 30 runs.
Initially, the BHBO was implemented with a population size of 20 in 100 iterations and the accuracy rate was measured. However, the BHBO did not achieve the desired results as it was found that it outperformed the BHHO only in two data sets. Therefore, the parameters were tuned further and it was found that by setting the population size to 40 and the iterations to 100, the BHBO outperformed the BHHO in four data sets, namely, 11_ tumor, DLBCL, leukemia 1, and leukemia 2. By comparing the results of BHBO with the rest of the other methods, we found that it achieved higher results in two data sets.
We noticed after performing the algorithm implementation that the data sets that contain a large number of samples and also contain a large number of classes, as in cases of 11_Tumors and 14_Tumors give less accuracy than the dataset that contains a small number of samples and a small number of classes, which represents the rest of the data used in the study
Table 6 shows a comparison of the BHBO and the abovementioned state-of-the-art algorithms in terms of classification accuracy. It can be seen from the table that the BHBO exceeded all these modern FS approaches in two data sets, namely 11_Tumors and DLBCL. In detail, the BHBO method outperformed six methods (GA, BSSA, BBA, BALO, BPSO, and BGSA) in the Brain_Tumor1 data set; and three methods (GA, BSSA, and BPSO) in the Brain_Tumor2 data set. It also achieved clear superiority over six methods (BHHO, GA, BSSA, BBA, BALO and BGSA) in the leukemia1 data set. Also investigated a distinction in accuracy on six methods, namely, the BHHO, GA, BSSA, BBA, BALO, and BPSO. In leukemia 2, the BHBO performed better than the BBA in 11_Tumors data set, in a data set, Prostate_Tumor, BHBO progressed in three methods namely BSSA, BALO, and BGSA. As shown in the table, the BHBO was ranked fourth, followed by the BHHO, BGSA and BPSO.
Table 6
Comparison of BHBO versus other optimizers in terms of average classification accuracy
Data set | BGSA | BPSO | BALO | BBA | BSSA | GA | SBHHO | BHBO |
11_Tumors | 0.7506 | 0.8162 | 0.7143 | 0.6286 | 0.6765 | 0.8162 | 0.8295 | 0.8319 |
14_Tumors | 0.5122 | 0.5557 | 0.5942 | 0.4619 | 0.5322 | 0.6381 | 0.5412 | 0.5136 |
Brain_Tumor1 | 0.8889 | 0.8333 | 0.9444 | 0.9185 | 0.8333 | 0.7778 | 0.9796 | 0.9200 |
Brain_Tumor2 | 0.8852 | 0.6300 | 0.9000 | 0.8534 | 0.7000 | 0.6667 | 0.8074 | 0.7148 |
DLBCL | 0.8750 | 0.8208 | 0.8750 | 0.9021 | 0.7792 | 0.9375 | 0.9396 | 0.9410 |
Leukemia1 | 0.8822 | 0.9844 | 0.9089 | 0.8778 | 0.8222 | 0.8444 | 0.8956 | 0.9241 |
Leukemia2 | 1.0000 | 0.9333 | 0.9333 | 0.5667 | 0.8689 | 0.8667 | 0.9156 | 0.9528 |
Prostate_Tumor | 0.8556 | 0.8937 | 0.8587 | 0.8746 | 0.8191 | 0.9746 | 1.0000 | 0.8686 |
SRBCT | 0.9706 | 0.9628 | 0.9412 | 0.8804 | 0.8863 | 0.8804 | 1.0000 | 0.9171 |
F_test | 0.9637 | 0.9637 | 1.4422 | 0.6966 | 1.6097 | 1.6136 | 0.9637 | - |
Mean rank (F_test) | 6.5 | 6.5 | 3 | 8 | 2 | 1 | 5 | 4 |
Overall rank | 6 | 6 | 3 | 7 | 2 | 1 | 5 | 4 |
As shown in Fig. 8, the BHBO achieved the best accuracy rate, at 0.8426614, followed by BPSO, GA, BBA, and BSSA with accuracy rates of 0.8255778, 0.8224889, 0.7737778, and 0.7686333, respectively.
One of the important reasons for the high accuracy that was obtained by implementing the BHBO is that the best combined values were found for the parameters. By tuning the parameters the best population size was found to have a value of 40 and the best value for the number of iterations was found to be 100. This played a crucial role in improving the performance of the algorithm. The other reason that helped to improve performance is that the BHBO does not need a large number of parameters; when there are a lot of parameters it is difficult to determine the combined optimal values for all the parameters.
Table 7 shows the p-values of the BHBO and the other algorithms. In certain instances, the p-values reveal a substantial difference between the BHBO and the other optimizers. These findings suggest that the BHBO may make a more consistent trade-off between the core searching phases. Thus, it has a greater chance of escaping local optima and avoiding immature convergence disadvantages, resulting in higher performance in terms of accuracy.
Table 7
Wilcoxon Rank-sum Test of the P-values for Classification Accuracy
Dataset | BHBO versus |
BGSA | BPSO | BALO | BBA | BSSA | GA |
11_Tumors | 0.767097 | 0.593955 | 0.767097 | 0.109745 | 0.015156 | 0.374259 |
14_Tumors | 0.767097 | 0.593955 | 0.767097 | 0.109745 | 0.015156 | 0.374259 |
Brain_Tumor1 | 0.767097 | 0.593955 | 0.767097 | 0.109745 | 0.015156 | 0.374259 |
Brain_Tumor2 | 0.767097 | 0.593955 | 0.767097 | 0.109745 | 0.015156 | 0.374259 |
DLBCL | 0.767097 | 0.593955 | 0.767097 | 0.109745 | 0.015156 | 0.374259 |
Leukemia1 | 0.767097 | 0.593955 | 0.767097 | 0.109745 | 0.015156 | 0.374259 |
Leukemia2 | 0.767097 | 0.593955 | 0.767097 | 0.109745 | 0.015156 | 0.374259 |
Prostate_Tumor | 0.767097 | 0.593955 | 0.767097 | 0.109745 | 0.015156 | 0.374259 |
SRBCT | 0.767097 | 0.593955 | 0.767097 | 0.109745 | 0.015156 | 0.374259 |
On the other hand, Table 8 shows that the BHBO was unable to effectively identify feature that fit the problem as the number of features was found to be greatly increased. This also explains the increase in the number of specific features when the parameters were tuned in this algorithm, which greatly increased the accuracy. It can also be seen from the table that the BHBO method ranked seventh in terms of performance in the F-test.
Table 8
Comparison of BHBO versus Other Optimizers in Terms of Features Selected
Data set | BGSA | BPSO | BALO | BBA | BSSA | GA | BHHO | BHBO |
11_Tumors | 6239.9 | 6202.2 | 7083.6 | 5092.6 | 7160.6 | 5906.7 | 3651.4 | 9045.8 |
14_Tumors | 7517.8 | 7486.0 | 11299.0 | 6063.4 | 9472.7 | 7235.1 | 6214.6 | 11233.7 |
Brain_Tumor1 | 2884.3 | 2822.6 | 2891.6 | 2404.8 | 2892.5 | 2670.6 | 2314.9 | 4100.7 |
Brain_Tumor2 | 5128.9 | 4953.0 | 6309.5 | 4210.9 | 5095.4 | 4815.8 | 3172.1 | 8004.5 |
DLBCL | 2669.2 | 2538.8 | 3634.7 | 2193.7 | 2941.9 | 2459.1 | 1818.5 | 3856.2 |
Leukemia1 | 2635.5 | 2552.9 | 3511.0 | 2191.2 | 2772.2 | 2443.3 | 1584.8 | 3632.1 |
Leukemia2 | 5513.6 | 5394.4 | 5686.4 | 4567.7 | 5557.5 | 5221.6 | 3956.6 | 8110.7 |
Prostate_Tumor | 5209.6 | 5070.4 | 6658.6 | 4201.0 | 5766.4 | 4942.5 | 3489.4 | 7487.8 |
SRBCT | 1143.6 | 1071.1 | 1168.4 | 962.7 | 1193.2 | 1012.9 | 784.5 | 1706.7 |
F_test | 2.31 | 2.32 | 1.11 | 3.52 | 1.48 | 2.41 | 3.79 | |
Mean rank (F_test) | 5 | 4 | 7 | 5 | 6 | 3 | 1 | 8 |
Overall rank | 4 | 3 | 6 | 4 | 5 | 2 | 1 | 7 |
As shown in Fig. 9, the number of features increased when the parameters were tuned for this algorithm, which caused a greater increase in accuracy compared to the previous parameter settings. Hence, despite the greater number of features chosen when tuning the parameters, the BHBO was better in terms of accuracy than many of the other approaches that were tested.
Figures 10 and 11 represent how the solution was obtained by the implementation of the algorithm on nine data sets, DLCBL and SRBCT, respectively. In the figures, the value of 1 denotes that the algorithm selected a feature, while the value of 0 indicates that the algorithm did not choose a feature.
Table 9 contains the fitness results. As shown in this table, the BHBO performed better in terms of fitness score as compared to the BSSA, BBA, BALO, and BGSA in the case of 11-Tumors, while it outperformed the GA, BSSA and PSO in the case of the Brain_Tumor1 data set. Also, the BHBO outperformed the BPSO in Brain_Tumor2, outperformed the BSSA and BPSO in DLBCL, and outperformed the GA and BSSA in the case of Leukemia1. Furthermore, it outperformed three methods (GA, BSSA and BBA) in the case of Leukemia2, and outperformed the BSSA in Prostate_Tumor. In terms of overall classification accuracy, it can be seen that the BHBO was ranked in fifth place, followed by the BPSO and the BBA.
Table 9
Comparison of BHBO versus other optimizers in terms of average fitness values.
Dataset | BGSA | BPSO | BALO | BBA | BSSA | GA | BHHO | BHBO |
11_Tumors | 0.252 | 0.185 | 0.289 | 0.342 | 0.326 | 0.187 | 0.172 | 0.233 |
14_Tumors | 0.488 | 0.445 | 0.409 | 0.511 | 0.470 | 0.363 | 0.458 | 0.532 |
Brain_Tumor1 | 0.115 | 0.170 | 0.060 | 0.037 | 0.170 | 0.225 | 0.024 | 0.115 |
Brain_Tumor2 | 0.119 | 0.371 | 0.105 | 0.037 | 0.302 | 0.335 | 0.194 | 0.357 |
DLBCL | 0.129 | 0.182 | 0.130 | 0.065 | 0.224 | 0.066 | 0.063 | 0.131 |
Leukemia1 | 0.125 | 0.020 | 0.097 | 0.066 | 0.181 | 0.159 | 0.105 | 0.127 |
Leukemia2 | 0.005 | 0.071 | 0.071 | 0.400 | 0.135 | 0.137 | 0.087 | 0.109 |
Prostate_Tumor | 0.148 | 0.110 | 0.146 | 0.098 | 0.185 | 0.030 | 0.003 | 0.171 |
SRBCT | 0.034 | 0.042 | 0.063 | 0.072 | 0.118 | 0.123 | 0.003 | 0.124 |
F_test | 1.029 | 0.991 | 1.470 | 0.619 | 0.497 | 1.659 | 1.022 | |
Mean rank (F_test) | 3 | 6 | 2 | 7 | 8 | 1 | 4 | 5 |
Overall rank | 3 | 6 | 2 | 7 | 8 | 1 | 4 | 5 |
Figure 12 shows the fitness rates for all methods, from which it can be seen that the BHBO achieved a fitness rate of 0.21098289, followed closely by the BSSA with a fitness rate of 0.23441111.
The initial population of the BHBO was randomly generated. In the following steps, the population was changed toward the optimum solution depending on the fitness values. This step was repeated until the termination criterion was reached. The reason for the improvement in the fitness values is because the initial population influences the quality of the last solution and the number of iterations required to reach the optimal solution. The quality of the initial population was improved to discover solutions of superior quality and eliminate duplicates in order to arrive at the best solution.
5.7 Discussion
In this chapter, the efficacy of the BHBO was compared with that of the BGSA, BSSA, GA, BPSO, BBA and BALO methods in terms of accuracy, number of features, and fitness values when applied on nine high-dimensional data sets. The F-test results. The Wilcoxon ranking test for linked samples, and the p-values of the BHBO and the other algorithms were calculated and obtained for the three measures mentioned above.
This chapter presented the results of the proposed wrapper FS model using a KNN classifier and HBO algorithm when it was deployed to select proper features in nine high-dimensional data sets with a small number of samples. The evaluation of this method was performed depending on several criteria: classification accuracy, number of features selected, error rate, precision, recall, F-measure, and convergence speed. All the obtained results showed that the BHBO enhanced the search capability, and that it was better than most of the compared state-of the-art methods for FS problems.
The comparison of the BHBO with the other optimizers showed that it outperformed all the methods in terms of accuracy in two data sets, namely, 11_Tumors and DLBCL. In more detail, the results of the experiments showed that the BHBO method was capable of outperforming six methods in the Brain_Tumor1 data set and three methods in the Brain_Tumor2 data set. It also achieved clear superiority in terms of accuracy as compared to six methods in the leukemia1 data set, also investigated a distinction in accuracy on six methods. In leukemia2, Moreover, the BHBO performed better than the BBA in the 11_Tumors data set, in a data set, Prostate_Tumor, BHBO progressed in three methods. In terms of fitness, the BHBO performed better as compared to the BSSA, BBA, BALO, and BGSA in the case of 11-Tumors; the GA, BSSA and PSO in Brain_Tumor1; the BPSO in Brain_Tumor2; the BSSA and BPSO in DLBCL; the GA and BSSA in Leukemia1; the GA, BSSA and BBA in Leukemia2; and the BSSA in Prostate_Tumor.