Sonar Data Classification Using Neural Network Trained by Hybrid Dragonfly and Chimp Optimization Algorithms

This paper proposes a hybrid Dragonfly Algorithm (DA) for training Multi-Layer Perceptron Neural Network (MLP NN) to design the classifier for solving complicated problems and distinguishing the real target from liars’ targets in sonar applications. Due to improve the cost computation and reducing the waste of time, a modified low-cost DA is designed for evaluation. To assess the accuracy of the technique, some well-known meta-heuristic trainers include Chimp Optimization Algorithm, Gravitational Search Algorithm (GSA), DA, and Particle Swarm Optimization (PSO) compared to show the accuracy of similar algorithms. DA and ChoA algorithms have remarkable features and a hybrid algorithm of them is proposed. The proposed classifier has acceptable performance, and two standard benchmark datasets are used to evaluate performance. The results show that the modified hybrid DA-ChoA has 15% less time-consuming and 4% better performance rather than the original dragonfly method.


Introduction
Recognition and classification of sub-aquatic targets have received much attention in recent years, including distinguishing between the real target and liar target objects such as fish tribes or background clutter. This procedure is one of the most confusing and challenging in this research field because of its variant attributes [1][2][3]. In underwater environments, some different echo classes exist. Noises, reverberation, and clutter are some classes of underwater echoes [4]. Classification techniques such as statistical processing [5], signal processing [6,7], and feature extraction algorithms [8] were classical schemes. In recent years, Neural Networks (NNs) are considered for their prominent specifications [9] such as a high degree of precision, adaptability, an inherently parallel configuration which is very appropriate for implementing in hardware and increasing the processing speed.
One of the best tools in classification domain is Multi-Layer Perceptron (MLP) NNs [10]. Learning is the main and important feature of this tool that without this essential block, the algorithm cannot improve primary results. From a technical manner, the method that provides learning for NNs is called trainer [11]. Trainers have two principal types: supervised and unsupervised learning [12]. Classic supervised methods such as gradient descent and Newton have poor quality and disadvantages such as trapping in local optimum and the need to explore the search space [13].
Problems in different fields can be applied to meta-heuristic algorithms especially trainers for NNs in highly complex problems [14][15][16]. There are different categories for metaheuristic methods. Two fundamental categories are based on the inspiration issue and the number of generated solutions in each step of the algorithm. The former methods such as swarm intelligence base [17], physical-based [18], evolutionary-based [19], and humanrelated. The last classification divides the algorithms into population-based and individualbased algorithms classes. In the former class, the trainer algorithm generates a multitude of solutions randomly and improves them during the procedure.
In individual classes, one solution generates randomly and improves over the iterations.
There is a basic question in the optimization domain as if and why we need more optimization techniques. By the No Free Lunch (NFL) theorem [20] the answer is found. NFL proves that anyone can not suggest a method to solve whole problems in the optimization domain. No special optimization algorithm is well-fitted to overcome all optimization challenges. In other words, most of the optimization algorithms act equally on average when intending all similar challenges in spite of the superior performance on a special optimization problem. This doctrine allows researchers to improve and modify existing algorithms for problems in different fields.
The theoretical researches in this context can be divided into three basic directions: introducing the existing methods, mixing and augmenting algorithms, and developing new ones. Firstly, stochastic operators or different mathematical processes have been used to augment the results such as chaotic maps [21,22], evolutionary operators [23][24][25], and local searches [26,27]. Afterward, different algorithms have been mixed and made hybrid algorithms to overcome some shortages or bottlenecks. Some hybrid meta-heuristic techniques are mentioned in the literature such as PSO-ACO [28], PSO-DE [29], and ACO-DE [30]. A popular research domain for many scientists is the proposal of new algorithms. Physical rules, and cooperative behavior of animals are one of the inspirations and motivation for a new method.
Parameters and constraints are the key elements involved in the optimization procedure. Parameters are the unknown factors of the system that have to be optimized. In This paper is organized as follows. Section 2, discusses the works related to this research domain. The MLP NN training algorithms is explained in Sect. 3. Section 4, describes an overview of the standard Dragonfly Algorithm (DA) and Chimp Optimization Algorithm (ChoA) that, how to use them for training an MLP NN. In Sect. 5, sonar datasets and their simulations are presented. Finally, in Sect. 6, conclusions are presented.

Closely Related Works
Population-based meta-heuristic optimization techniques share a common feature despite their nature. The search space is divided into two distinct states: "exploration" and "exploitation" [31,32]. The algorithm must have operators that globally search and explore the whole of the search space. Exploration is the capability of an optimizer to have high random behaviour effecting the solutions. If solutions have big changes, the consequent exploration ability will be greater. The exploitation phase follows the exploration phase and expressed as the phase of detailed investigation of the promising areas of search space.
There must be no clear border between exploitation and exploration phases. Finding an appropriate balance between exploitation and exploration is the main challenge in a meta-heuristic optimization simultaneously. These two phases are correlative that augmenting one phase, in castrating another phase.
DA algorithm is a powerful meta-heuristic optimizer introduced in recent years. This algorithm has two main purposes: hunting and migration. The hunting procedure is called the static swarm and the migration procedure is called dynamic swarm. DA algorithm mimics the five primitive principles of the swarming behaviour of dragonflies. Separation behaviour was applied to avoid collisions between insects in the tribe. Alignment is a tool for matching the velocity of individuals rather than other swarms. Cohesion refers to the tendency of individuals toward the average neighbourhood. Attraction means dragonflies move towards the food sources and for survival distract outward enemies. In [33], DA algorithm has investigated the effect of the original updating mechanisms of the coefficients on exploration and exploitation phases. After that, three modified updating techniques are proposed for coefficients of the algorithm. These modifications improve results slightly, but increase the complexity of the algorithm remarkably.
In [34], the great number of DA algorithm variants are mentioned. These variants include modified DA, hybridization DA, and multi-objective DA. Merits and disadvantages of each variant are analyzed and some applications of these variants are mentioned. Classification of sonar targets in real environments is needed low cost and accurate algorithm. Original DA is complex and has medium accuracy rather than some other algorithm such as MFO. For improving the performance of algorithm, the hybridization technique can be used, but with consideration of low complexity rather than original DA. The literature indicates that mixing the meta-heuristic algorithm with chaotic local search is the low computational-cost technique to improve exploitation and exploration states. The motivation of this research is utilizing chaotic local search to improve the performance of the DA optimizer. Chaotic local search has stochastic behavior also is a deterministic method. To reach a real-time procedure, applying the MLP NN and parallel structure of the DA optimizer.

Multi-layer Perceptron Neural Network
MLP NNs are the most useful and well-known kinds of NNs. Each neuron is interconnected to other neurons. A schematic of the MLP NN is shown in Fig. 2 with two layers and p input nodes, h neurons in the only hidden layer and m neurons in the output layer.
In this MLP NN, two layers handle the procedure that the jth input of P (hidden layer) and its outputs is shown as follows: where x i is the ith input, w ji represents the weight from the ith input to the jth neuron and b j in the jth neuron is bias.
F is the activation function that calculate the output of hidden layer's neurons in Eq. (2).
where w kj is the synaptic weight relating the output of the jth neuron in the hidden layer to the mth output layer neuron. D k is kth neuron bias in the output layer.
G function, calculates the last outputs of MLP NN in Eq. (4). Connection biases and weights of each neuron are the most important segments of the MLP NN. In sonar target classification, real target presented by 1 and liar target by zero. For error evaluation, use Eq. (5) for minimizing the error.
To find the optimum values of weights and biases to achieve the best fitness (minimum error), the teaching algorithm such as gradient descent and Newton is applied.

Fig. 2 MLP NN schematic and details
The training process of updating weights next time is shown in Eq. (6). is the learning coefficient of w i (n + 1) and n is the current time step.
In most cases, three methods express the configuration of parameters of the MLP NN as a particle of meta-heuristic trainers: (1) vector, (2) matrix, and (3) binary state. In vector trainer, each agent presents only one vector. In matrix presentation, a matrix presents each particle and in binary type, each agent is expressed as a bits string. Vector trainer is used because of its simplicity in this research [35][36][37][38].
Sejnowski [39] and Iris [40] datasets are used in this research. Sejnowski classified the sonar echoes using a NN. The main task is training a network to distinguish sonar echoes of a metal target from sandy rocks. The transmitted sonar echoes is a frequency-modulated chirp. The Sejnowski dataset contains 97 patterns obtained from rocks under similar conditions. 111 patterns obtained by bouncing sonar signals off a metal cylinder at various angles and under various conditions. These (i.e., 208) signals were obtained from a variety of different aspect angles, spanning 90 degrees for the cylinder and 180 degrees for the rock. Each pattern is a set of 60 numbers normalized to 1. Each number represents the energy within a special frequency band, recorded over an absolute period of time.
This means the input number of MLP NN is equal to dimensions of problem (i.e., 60). Output neurons are two that present real and liar targets. Iris dataset has four dimensions and three output neurons including real target, liar target, and clutter. In Sejnowski, the dataset number of inputs are too many that increase the complexity of the problem for choosing the number of hidden neurons of MLP NN. For determining hidden layer numbers, some theoretical and empirical methods are considered.
In Table 1, some papers and proposed formulations are presented. Ref. [42] is used for selecting number of hidden neurons equal to 11.

Trainer Algorithms
This section is expected to provide useful and perfect information to DA and ChoA trainers which are used for designing the classifier.

Dragonfly Algorithm
As point out before, DA is a new and innovative population-based meta-heuristic optimization technique. In this algorithm, search agents have two behaviors [45,46]. Hunting and migration are the main phases of this algorithm. In hunting (static swarm), local movement and unexpected changes in the flying path are the main characters. In migration (dynamic swarm), a large group of dragonflies make the swarm for transferring to long distances. Static and dynamic swarm is very similar to the main phases of metaheuristic algorithms: exploration and exploitation. Separation, alignment, and cohesion addition food, and enemy are the main operators of algorithm. These five main operators, adjust the position updating of individuals in the swarm. Each of these operators is mathematically modelled as follows in Table 2.
Where X is the position of the current individual, X j shows the position jth neighboring, and N is the number of population. Δx j is the step of jth neighboring individual similar in PSO algorithm and X + shows the position of the food source. The position of the enemy is X − . All the five operators update in each time iteration and then make the next position of each individual by position and step vectors. Equation (7) represents the step vector.
All the lower-case alphabet such as: s, a, c are weights of basic operators and calculated randomly. The position vectors are calculated as follows: In static swarm, alignment is very low and coherence to food is very high. If alignment is very high and coherence is low, the exploitation phase is operated. For transiting the exploration phase to exploitation, the radius of neighbours should be increased by iterations. In simulation Eq. (9) is used for calculating the initial radius of neighbours. This algorithm starts by making a set of random solutions for a specific problem. However, the position is initialized by random values limited within the lower and upper bounds. In each time iteration, the position and movement of each dragonfly are updated using Eqs. (7) and (8). The process of position updating is continued until the satisfaction of criterion is reached. The DA algorithm pseudo-codes are shown in Fig. 3.

Chimp Optimization Algorithm
ChoA algorithm [47] is inspired by the individual intelligence and sexual motivation of chimpanzees in their tribes haunting. Some animal tribes such as chimps are fission-fusion colonies. The population of these tribes change as time passes and members of colony moved throughout the desert. Each tribe of the chimp try to explore the search (hunting) space with exclusively and independent strategy. In each tribe, hunters have not uniform ability and alertness. The ability of each chimp can be useful in special situations.
Four types of chimps exist in a tribe: attacker, chaser, barrier, and driver. They have not equal abilities but can do hunting successfully together. Drivers scare the prey and follow that. Barriers locate themselves in the jungle tree to prevent prey moving back. Chaser is nimble and catches the prey. Finally, attackers hunt the prey. The prey hunted during the exploitation and exploration phases. For modelling the behavior of chimp hunting, Eqs. (10) and (11) are proposed.
where t is the current iteration number, a, m, and c are vectors of exclusive coefficient. The prey position and chimp (hunter) position are X prey and X chimp , respectively. Equations (12) to (14) calculate the a, m, and c vectors.
Also f through the iteration is reduced from 2.5 to 0 by non-linear process. r 1 and r 2 are random vectors and chaotic values calculated based on the different chaotic map. Chaotic map shows chimps sexual motivation in hunting operation. The first attacker is the best solution in meta-heuristic problems. Barrier, driver, and chaser have useful information about the prey position. Four of the best solutions are saved in memory and others try to follow these four solutions. This relationship is expressed by Eqs. (15) to (17).
Chaotic behavior in last part of algorithm helps chimps to overcome two basic metaheuristic problems: low convergence rate and entrapping in local optima for overcome high-dimensional and complex problems. To model two behaviors simultaneously (normal updating or chaotic model) mathematical model is formulated by Eq. (18): In sonar target classification, the sinusoidal map has the best performance in low-dimensional problems.
In ChoA, the exploration process begins with making a random population of chimps (candidate solutions). Then, all chimps are randomly divided into four distinct groups: chaser, driver, attacker and barrier. Each chimp updates its f coefficients deal with the population strategy. During the specific time iteration, attacker, driver, chaser and barrier agents estimate the location of prey as possible. Each chimp measure and updates its distance from the target (prey). The c coefficient is adaptive tuning and m vectors is a factor for avoiding stuck in local optima and faster convergence, simultaneously. The coefficient of f is changed from 2.5 to zero non-linearly, that leads exploitation enhancement. Figure 4 express the pseudo-code of ChOA.

Hybrid DA-ChoA
DA and ChoA algorithms are newly developed and have good performance as classification rate and convergence. They use complex operators that need more processing power and memory rather than some old meta-heuristic algorithms such as PSO and GA. ChoA under the guidance of four main chimps, has strong local search ability and has fast convergence. The DA has intrinsically excellent global search vacancy because of its procedure of being distracted by outward enemies and attracted towards food sources, but its convergence rate is lower and has high potential trapping in local optima. The hybrid algorithm include two excellent parts, that can avoid the disadvantages of each trainer in the process of optimization and enhance the exploitation and exploration of the method simultaneously. For better (15) performance, some novel constraints are applied to hybrid algorithm. This is guarantee the feasibility of solutions during processing. By using flow chart, the operation structure of the proposed DA-ChoA trainer is shown in Fig. 5. Three rules are assigned to the hybrid algorithm for avoiding fluctuation in convergence cure. Rules are described in Table 3.
In hybrid DA-ChoA setting of k max equal 6 and DA has high level priority to run. DA algorithm quests entire search space in the optimization process and generate the best global solutions for the next time iterations. If DA cannot update the optimal solutions for 0.5 * k max times, ChoA algorithm is executed.
This procedure is beneficial for great optimization with the help of its' smart local search ability and low-cost computation. For more useful improvement of performance of the algorithm, chaotic local search method is applied to the hybrid technique as an extra   Table 3 Rules for hybrid DA_ChoA flow chart Rule # Description Rule 1 If best local map score is degraded rather than the last score, repeat the score and replace DA's last best position Rule 2 If best ChoA score is degraded rather than last score, repeat score and replace DA with the last best position Rule 3 If beat DA score is degraded rather than last score, repeat score and replace ChoA's last position option. Chaotic local search generate candidate accurate solutions as the iteration is progressed. By ChoA, hybrid algorithm can settle the optimization problems with high dimensional. This configuration of algorithms and methods with consideration rules can compensate the drawbacks of each algorithm greater and highlight their advantages.

Training MLP NNs Using the Hybrid Trainer
Typically, the configuration of the parameters (biases and weights) of MLP NN is represented by three methods: (1) vectors, (2) matrix, and (3) binary state.
Two levels are existed in training an MLP NN using the meta-heuristic trainer: first, the formulation of the problem's characteristics by the search particles of trainer algorithms and second, designing the fitness function. In vector mode, each search agent of meta-heuristic algorithm (particle) presents only one vector. In matrix mode, each search agent is shown by a matrix. In binary mode, each search agent is shown as a string of binary bits.
The first method is simple and typically used in NNs. The vector method is assigned to the MLP NN because its structure is simple in this paper. Figure 6 shows the vector-based configuration of MLP NN parameters.
After parameters are definined, for evaluating the meta-heuristic trainer, a fitness function must be defined. Obtaining the highest testing accuracy is the final goal in training an MLP NN. This procedure is shown in Fig. 7.
Mean Square Error (MSE) is one of the well-known metrics used for evaluating the fitness in MLP NNs. Formulation of MSE is described in Eq. (18). d k i shows desired output in dimension of k and o k i is ith calculated output from the MLP NN:

Setting Parameters
For evaluating the efficiency of hybrid DA-ChoA in training MLP NN, some famous trainers such as PSO, GSA, and GWO are benchmarked. The essential parameters and setting of these algorithms are presented in Table 4. Number of particles and maximum number . . . bh Fig. 6 The problem's parameters is applied searching particles of iteration in each algorithm is equal. In PSO algorithm, the basic version is applied and also GSA. If maximum number of iterations was less than 500, the results was not clear for evaluating the convergence and some other behavior of algorithms.

Simulation Results and Analysis
Mentioned classifiers in Table 4 are applied on Sejnowski and Iris datasets and performances are evaluated in term of the processing time, classification rate, speed of convergence and stuck avoiding in local optimum.
The simulations were run in MATLAB using a Laptop with a 1.8 GHz CPU and 4 GB RAM. Each dataset is divided into a test set (30 percent) and training set (70 percent of data). Each simulation is tested 10 times and mean results are shown in Figs. 8 and 9. Tables 5 and 6 show the statistical results in Iris and Sejnowski datasets, respectively.  In Fig. 7, the Iris dataset is used for showing the classification accuracy and convergence rate. The designed classifier with DA trainer has the best performance than other trainers in term of classification rate. After that, ChoA has better accuracy rate. GSA and PSO have not acceptable score. In term of convergence speed, PSO and GSA are fast, but they are entrapped in local minima and their performance of MSE are not acceptable. The best performance in term of MSE is for ChoA, but it converges slower and needs more iterations (generation). Figure 8 shows results of designed classifier in Sejnowski dataset benchmark. This dataset is more complex than Iris and has 60 dimensions. After GSA, DA has good accuracy in terms of classification. The convergence rate of DA and ChoA is very close to GSA.
Two metrics, include Average (AVE) and Standard Deviation (STD) of MSE in Tables 5 and 6 of the results are shown. Note that the best results are highlighted in bold type in Tables 5 and 6. To understand whether the obtained results differ from other benchmarks in a statistically criteria, a non-parametric statistical test, Wilcoxon's   rank-sum test, was accomplished at 5% significance level. Although, if p values was less than 0.05, it means powerful evidence against the null hypothesis is existed. The classification rates and time consumed for each run are the important comparative criterion.
In Table 5, the Iris dataset is used for evaluating each trainer. The best AVE and STD is for Choa. Hence, DA and ChoA are the slowest algorithms that need near 1.5 times more delay (execution time) rather than PSO and GSA. All the trainers have acceptable p value.
Sejnowski dataset is applied to trainers and indicated in Table 6. After GSA, DA has minimum average MSE and its delay is 2 times bigger than PSO and GSA.
In both datasets, DA and ChoA have large time-consuming mechanisms and by hybrid, DA-ChoA can decrease execution time problem and improve the classification accuracy.
In Table 7, both datasets indicated that execution time is decreased and hybrid trainer has the minimum execution time in both datasets. In the MATLAB software, we do not use pipeline or parallel structure and in high dimensional Sejnowski dataset, reduction in time consumption is not clear rather than Iris.
In the Iris dataset, classification rate of hybrids DA-ChoA has improved 4% rather than DA and 12% rather than ChoA. This improvement indicates that hybrid algorithm is powerful in exploration and exploitation mechanisms and prevents to entrap in local minima. In the Sejnowski dataset, hybrid trainer cannot improve classification rate rather than DA but is 6% better than ChoA trainer.
For more clear evaluation, sonar dataset classification results of DA, ChoA, and hybrid trainer are shown in Figs. 10 and 11. In the Iris dataset, convergence curve indicates that the hybrid algorithm has the best performance. In lower iterations, ChoA has lower MSE, but at last, has not had enough performance rather than hybrid. These results indicate that hybrid algorithm has an extraordinary ability in small size dataset classification because of good exploration in the entire search space. Also, hybrid has the higher speed in convergence and better performance in escaping from local minimum traps.
In the Sejnowski dataset, the dimensions of the dataset are 60, and Fig. 11 shows the performance of DA is better than other trainers. This is because of nature of hybrid that derives from ChoA, but after 300 iterations close to DA. The high slope of the hybrid curve shows it has greater exploration rather than two other trainers.

Conclusions
A hybrid DA-ChoA classification algorithm based on MLP NN has been simulated by MATLAB simulator software. Proposed hybrid algorithm, improve the exploration ability of ChoA for training an MLP NN for the first time.
For performance evaluation, two famous benchmark datasets in sonar domain were used. Sejnowski and Iris are imported to hybrid and original classification systems and the results show that hybrid DA-ChoA has advantage rather than the other meta-heuristic algorithms. This hybrid technique has clear advantages in terms of convergence speed, classification rate and stuck in local minima, especially for low dimensional problems. The time consuming of process is lesser than original DA and ChoA algorithms. In big data applications, the dimension of dataset and volume of stored data is too challenging. The 4% enhancement in classification rate and 16% decrease in time consuming is very important and interesting.
Funding The authors did not receive any funding for this study.

Data availability
The datasets generated during and/or analyzed during the current study are available.

Declarations
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. He is the author of more than 500 scientific publications in journals and international conferences in addition to 12 academic books. His research interests include circuits and systems design. He is also editor-in-chief of "Iranian Journal of Marine Technology" and editorial board member of "Iranian Journal of Electrical and Electronic Engineering" and "GPS Solutions".