BEO: Binary Equilibrium Optimizer Combined with Simulated Annealing for Feature Selection

This work proposed a binary variant of the recently-proposed Equilibrium Optimizer (EO) to solve binary problems. A v-shaped transfer function is used to map continuous values created in EO to binary. To improve the exploitation of the Binary Equilibrium Optimizer (BEO), the Simulated Annealing is used as one of the most popular local search methods. The proposed BEO algorithm is applied to 18 UCI datasets and compared to a wide range of algorithms. The results demonstrate the superiority and merits of EO when solving feature selection problems.

According to [1], by the end of 2020, there will be around 40 trillion gigabytes of data which is a staggering number and 90% of this data have been generated in the last two years. As a matter of fact, only 0.5% of data have been analyzed in the year 2012, and the percentage must have decreased more in recent times.
One of the reasons for this small percentage is due to the lack of enough tools to analyze this enormous quantity of data. This has resulted in the beginning of a new field in the domain of data processing which is known as dimension reduction [2]. The motive of dimension reduction is to keep the meaningful information of a large dataset even while reducing the number of dimensions under consideration.
One of the most sought areas after dimension reduction procedures is Feature FS procedures can be categorized into two different variants: wrapper and filter. Filter methods [3,4,5,6,7,8,9] look for statistical interpretations of data to find the most informative and appropriate set of features. Wrapper methods [10,11,12,13,14,15], on the other hand, use learning algorithms (e.g. classifiers [16]) to evaluate candidate feature subsets and guide the further searches according to the evaluation outcomes. Although wrapper methods are computationally more expensive than filter methods (due to the use of learning algorithms), they are able to find superior solutions in comparison to filter methods. In order to improve performance of such algorithms, a recent trend is to blend multiple algorithms to combine advantages of the individual algorithms.
These hybrid algorithms [17,18,19,20] tend to perform better by improving their exploration and exploitation ability (through inclusion of local or global search techniques). Some researchers are currently using a combination of filter and wrapper to reach a better solution by using advantages of both the models.
These models are known as embedded models [21,22]. The quality of such models depends a lot on how wrapper and filter parts interact with each other which requires an extra level of tuning leading to higher computational complexity.
In this paper, we have made an attempt to address the FS problem using a recently proposed optimization algorithm known as Equilibrium Optimizer (EO) [23]. It mimics the process of control volume mass balance used to predict equilibrium and dynamic phases where the equilibrium state is treated as the optimal solution to the optimization problem. FS can also be considered as an optimization problem where the goal is to find an optimal feature subset subject to constraints like high classification accuracy and low number of features. This intuitive similarity between FS and optimization has motivated us to modify the EO and apply it to solve FS problems.The contributions of this paper are highlighted below: 1. Application of EO to FS for the first time to the best of our knowledge.

Modification of EO's exploitative abilities through the use of Simulated
Annealing.
3. Validation of the proposed FS framework over 18 well-known UCI datasets.
The rest of the paper is organized as follows: Section 2 provides a brief review of the similar kinds work carried out by different researchers across the globe. Section 3 provides detailed description of the proposed FS model. The results obtained by the FS version of EO are explained in Section 5. Finally, Section 6 concludes our work and provides directions for future extension of this work.

Literature Study
It is quite interesting how solutions to many complex optimization problems are lying hidden in nature waiting to be discovered by us. Throughout the and introduced tribe competition to support the evolution process. A modified GA (MGA) has been proposed in [26] where the authors have updated certain operations of GA to provide guidance to the chromosomes. For example, in place of typical crossover, they have proposed a crossover procedure which is guided by the fitness measures of the parent chromosomes. MGA has been used to perform FS before demand forecasting in outpatient department. Apart from these recent additions, there are numerous other work on GA which can be found in [27,28,29,30,15,31,32]. PSO is the brainchild of Kennedy and Eberhart. In [33], they have proposed PSO trying to replicate the motion of organisms like birds in a flock or fishes in a school. Different variations of PSO have been proposed by researchers over time [34,35,36]. ACO is based on the food searching process followed by ants in nature. Dorigo [38] which is inspired from mass interactions and law of gravity.
All these algorithms are very popular and highly used in the FS domain.
According to No-free Lunch theorem for optimization [39],however, there is no perfect algorithm to solve all optimization problems. That is why researchers continue to propose various new algorithms to solve FS problems. One of the main challenges in any FS problem is to find a proper stability between exploitation and exploration. In recent times, various other metaheuristic algorithms like Grey Wolf Optimizer (GWO) [40], Whale Optimization Algorithm (WOA) [41], Ant Lion Optimizer (ALO) [42], Salp Swarm Algorithm (SSA) [43], etc.
have been developed in search of the trade-off.
EO is one of the most recent algorithms in the domain of optimization. Faramarzi et al. have proposed EO in [44]. EO uses the concept of control volume mass balance where each particle represents a solution and its concentration represents the position. The best solutions are termed as equilibrium candidates. Other candidates update their concentration based on the equilibrium candidates to reach the final equilibrium state which can be considered as the optimal solution of an optimization problem. The EO algorithm was tested on 58 benchmark problems and three engineering case studies. EO outperformed a wide range of algorithms including GA, PSO, SSA, GA, and GWO. To the best of our knowledge, however, EO has not been adapted for FS problem. This has motivated us to update EO and make it applicable to solve FS problems. In this paper, we have mapped EO to the binary space of FS and applied it over 18 well-known UCI datasets to prove its usefulness in the FS domain.

Equillibrium Optimization: An Overview
Equillibrium Optimizer (EO), first proposed in [44], is a physics based optimization algorithm inspired by dynamic source and sink models for estimating equilibrium states. This method is based upon simple well mixed dynamic mass balance on a control volume, where a mass balance equation is used to determine the concentration of a nonreactive constituent in a control volume as a function of its various source and sink mechanisms. The generic equation of mass-balance is identified with a first order differential equation [45] given by where V is the control volume, C is control volume concentration, C eq is equilibrium state concentration without any generation inside control volume, G is the mass generation rate, V dC dt denotes mass changing rate, Q is the flow rate, in volume, into and out of the control volume. Equation 1 indicates that, change of mass in time equals mass entering the system minus mass leaveing the system plus mass generated inside. An equilibrium state is reached when V dC dt reaches zero. A rearranged Equation 1 is given in Equation 2.
where λ = Q V , implies turnover rate. Equation 3 indicates integration of Equation 2 over time.
where e 0 and C 0 are initial start time and initial concentration. Equation 4 shows the result of Equation 3.
As followed by Equation 4, there are three terms representing the updating rule of the concentration of a solution. First term is: equilibrium concentration.
One of the best solutions appeared till current iteration is used as equilibrium concentration and it is selected randomly from a pool, namely equilibrium pool.
The second term indicates concentration difference between a solution and equilibrium state. The third term is involved with generation rate. Mathematical expression and explanation of these components are given in the following subsections.

Equilibrium Pool and Candidates
Equilibrium state is the convergence state of the algorithm, and it is expected to be the global optima of the problem in consideration. At the beginning, the equilibrium concentration is not known to the algorithm, rather equilibrium candidates provide the particles with a search pattern. As per [44], we have considered equilibrium pool consisting of five concentrations: four best-so-far concentrations obtained during the process and their arithmetic mean. These four concentrations help in exploration capability of EO and the arithmetic mean helps in exploitation.

Exponential Term, F
The term F in Equation 4 helps EO find balance between exploration and exploitation. λ is a random vector in [0, 1] since turnover rate variable w.r.t.
time in a real control volume. t is a function of iteration.
where iter and maxIter are current and maximum number of iterations, respectively. a 2 is a constant, higher a 2 indicates lower exploration and better exploitation ability.
In order to guarantee convergence the search speed needs to be slowed down, and to perform that t 0 is considered as: where, a 1 is a constant and higher a 1 value implies higher exploration ability.
r is a random vector in [0, 1]. sgn function indicates the direction of the search process. As per [44], we have set a 1 = 2 and a 2 = 1. By substituting Equation 9 in Equation 8, we obtain:

Generation Rate, G
In EO, the generation rate (G) plays an important role to provide the optimum solution. G is calculated as: where  [44]. So, finally the concentration updating rule in this algorithm is given by: V is considered as unit.

Simulated Annealing
In metallurgy and materials science, annealing [46] is a heat treatment in which a solid is heated up to a maximum temperature, where the solid becomes liquid and then it is cooled by slowly lowering the temperature. Simulated Annealing (SA) [47] is a single solution based meta-heuristic algorithm which is an enhanced version of the hill climbing [48]. SA uses a certain probability to accept a bad 'move' to overcome the problem of being trapped in locally optimal solutions. For a particular solution (agent), a neighboring solution is generated [17] and evaluated using fitness function. If the fitness value of the neighbor is better than current solution, then the current solution is replaced with the neighbor. If the fitness value of the neighbor is worse than the current solution, the neighbor is accepted with a probability value generated by the Boltzman equation, p = e −θ/T k . So, the acceptance probability function is given as: Here, θ = f it(neighbor) − f it(curr.sol n ) is difference between the fitness value of the generated neighbor and current solution. T k denotes temperature at k th instance of accepting a new solution. Initially T k = 2 * |D| is considered, where, |D| is dimension of the problem. For further iterations, T k+1 = αT k is adopted, α is the cooling coefficient, α [0, 1].

Proposed Binary EO
Now, the main challenge in a FS problem is to search for the best possible feature subset. Specially in wrapper based problem, since each feature subset is needed to be evaluated using a learning algorithm (classifier), searching for the best feature subset is highly time consuming.  In the continuous version of EO, the concentration of the solution is updated as Equation 14. We need to use a transfer function [49] to propose the binary version of EO. Different transfer functions are proposed in [49]. Amongst those, we have chosen, a V-shaped transfer function to propose binary EO (BEO) due to its significant impact on a wide range of meta-heuristics. The used transfer function, depicted in Figure 1, is given by Equation 16.
The concentration in real domain will be converted to binary vector as per Equation 17 using the probability value generated by Equation 16.
where X d t is d th dimension of the concentration in t th iteration, X d t−1 is d th dimension of the concentration in (t − 1) th iteration, rno is a random number, rno ∈ [0, 1].
EO performs exploitation mainly using the Equilibrium pool [44]. This pool controls both exploration and exploitation. During initial iterations, distance among the equilibrium candidates is high and using these candidates to update the concentrations helps perform global search. With the increasing number of iterations, the equilibrium candidates come closer to each other and now, using these candidates to update the concentrations helps perform local search encircling the candidates, which results in exploitation. Exploration is exclusively taken care of using Generation Probability (GP, Equation 13). But there is no such thing to consider exploitation particularly. So, we have incorporated the concept of SA to perform local search i.e., take care of exploitation specifically.
The hybrid version of EO is labeled as BEOSA (binary EO with SA).
FS is considered as a multi-objective optimization problem due to two different criteria to evaluate the feature subset in consideration [50]: classification accuracy and number of selected features. To be specific, the objective of FS is to achieve maximum classification accuracy with minimum number of features.
These two criteria are of contradictory purposes [51], so we have considered the classification error rate instead of accuracy. Equation 18 aggregates both of the objectives together and converts the FS problem into a single objective problem.
where, denotes the classification error rate, |υ| represents the number of features in the subset being evaluated and |D| represents total number of features in the dataset. µ and η respectively represent the weight (importance) of the subset length and the classification error, and η+µ = 1. We have used K-Nearest Neighbor (KNN) classifier [52] to compute the classification error ( ).

Experimental Results
This section explains the experiments used to prove the applicability of EO in FS domain. We have used K-nearest Neighbors Algorithm (KNN) [52] classifier to measure classification accuracy. As per the recommendations in [50,51,17], we have set K = 5. For each dataset, 80% of the instances are used for training the model and rest 20% are used for testing. Proposed method is implemented using Python3 [53] and graphs are plotted using Matplotlib [54].

Dataset Description
In order to assess the performance of EO, 18 UCI datasets [55] have been used. The datasets are selected from various different backgrounds. The description of these datasets are presented in Table 1. Table 1

Parameter Tuning
There are two parameters which are always very important for any multiagent evolutionary algorithm: population size and number of maximum itera-tions. Population size characterizes how a single agent learns from other agents' experience and iterations provide step-wise evolution of the agents. In order to find the proper values for these two parameters, experiments have been performed by varying one parameter w.r.t. the other.
Population size for BEO and BEOSA, is varied as [5,10,20,30,50]. Number of maximum iterations is set as 50, which we have determined from experiments.
Classification accuracies are reported in Figure 2 with different population size.
Considering that computational time increases with increasing population size and the accuracies provided in Figure 2, we have decided to fix population size as 20 for further experiments. Figure 3 shows how the fitness values of BEO and BEOSA change with increasing number of iterations.

Results and Discussion
In this section, we have provided the results obtained by BEO and BEOSA over the datasets mentioned in Section 4.1. From   From this discussion, we can conclude that BEO and BEOSA are able to successfully select the optimal set of features from the datasets. The accuracy provided by both the models using very few features make both of them very competitive in the field of FS.
Overall we can say that although BEO and BEOSA perform really well in FS, the results obtained by BEOSA are better than BEO. Hence, it can be concluded that SA plays a significant role in improving performance of EO.
Basically, in EO, exploration in the search space is guided by the four particles present in the equilibrium pool. The fifth particle in the pool, which is the average of the other four particles, mainly helps in exploitation. Whereas, the balance between exploration and exploitation is maintained by the exponential term mentioned in section 3.1.2. But, there may be situations when the particles in the equilibrium pool become similar in nature which indicate that they belong to the same part of the search space. As a consequence, it leads to the massive exploitation of that specific part. On the other hand, it limits the algorithm to explore the entire search space to look for the global optima. Here comes the role of the SA which actually helps EO to bring in exploration of the search space, thereby aiding EO to perform well. In this context it is to be noted that even though SA increases the exploration ability of EO, but the explorationexploitation trade-off remains intact due to the usage of exponential term.

Comparison
In this section, we present the results of the proposed BEOSA and highlight how it performs w.r.t. state-of-the-art. We have compared the results of BEOSA with three recent works: GSA based [56], ALO based [57], and GWO and WOA based [50]. Figure 4 shows the performance of BEOSA in terms of achieved classification accuracy. Inspecting Figure 4, it can be observed that BEOSA performs best in 12 cases (66.7%). In case of BreastEW, HeartEW and KrVsKpEW, it has    In order to determine the significance of the obtained results, we have performed Wilcoxon rank sum test [58] with 5% significance level for each pair of methods used in this section.

Conclusion
In this work, we made an effort to propose a binary variant of EO to make it applicable to the field of FS.