Convergence analysis of butterfly optimization algorithm

Convergence analysis of any random search algorithm helps to verify whether and how quickly the algorithm guarantees convergence to the point of interest. Butterfly optimization algorithm (BOA) is a popular population-based stochastic optimizer introduced by mimicking the foraging behaviors of butterflies in nature. In this paper, we have developed the Markov chain model of the BOA and analyzed the convergence behavior of the algorithm. The Markov chain model of the BOA is constituted where the population sequence generated by the algorithm is found to be a finite homogenous Markov chain and the defined population state set is found to be reducible. Convergence analysis of the algorithm is performed mathematically using the Markov chain model of the algorithm with the help of global convergence theorem which is based on a random search algorithm satisfying two subtle conditions. The butterfly algorithm has been found to satisfy the conditions for the global convergence theorem to enact, whereof it guarantees the global convergence of the BOA. We have also tried to show experimentally that the convergence of the algorithm does not always have considerable impact on rate of convergence as it is influenced by various other factors. Moreover, the convergence of BOA has been compared with several state-of-the-art algorithms experimentally. Further, the effects of the parameters, namely sensory modality and power exponent on the performance of BOA, have been studied.


Introduction
Stochastic optimization algorithms generate and use random values of the decision variables as initial solutions for global optimization. The stochastic algorithms can be divided into two categories: heuristic and metaheuristic. The phrase 'by hit and trial' can describe heuristic algorithms in a very simple way. Heuristic algorithms are aimed at good solution instead of the best solution to an optimization problem. It is very difficult to find a best solution to problems in the heuristic environment and it is not beneficial to search for problem-dependent algorithms because of the broad spectrum of recent problems. On the other hand, metaheuristic algorithms are problem-independent algorithms which utilize certain trade-off between randomization and local search (Chakraborty et al. 2022, Nama et al. 2018, Saha 2022. The metaheuristic algorithms are obtained via considerable improvement of heuristic algorithms. Based on their performance, the metaheuristic algorithms have an upper hand compared to heuristic algorithms. The two main components of metaheuristic algorithms are: diversification or exploration and intensification or exploitation. Diversifying means generating a variety of solutions for exploring globally and intensification represents searching locally through the use of information found by a good current solution (Nama et al. 2017). Randomization is an effective way to move from local to global search. In consequence, almost all metaheuristic algorithms are appropriate to be optimized globally (Yang 2011).
While metaheuristic algorithms have been used widely, the main issues in such algorithms lie in their mathematical analysis which has been carried out on a very small scale. The simple prevailing reason behind this is that the interaction between its different components is highly nonlinear, complex and stochastic (Yang 2011).
The first question which may arise for any random search algorithm is whether the random search converges to a global optimal value. Convergence analysis of any random search algorithm helps us to verify whether the algorithm under consideration has the ability to converge to the point of global optimum or not and how quickly it does.
Convergence theorem for random search algorithms by (Solis and Wets 1981) provides a probability convergence proof to the global minimum for general step size algorithms with conditions on the step length and direction generation method. Essentially, the algorithm will converge with probability one as long as the generator does not consistently ignore any region. Later in 1992, Claude 1992 demonstrated that even if the generator of an algorithm cannot reach any point in the domain in a single iteration, if there is a means, such as an acceptance probability, to allow the algorithm to reach any point in a finite number of iterations, the algorithm still converges with probability one to the global optimum.
Apart from a lesser number of researchers merely discussing the theoretical aspect of metaheuristic algorithms in their work, some works on the convergence of various algorithms are listed below: In 2018, (Qian and Li 2018) analyzed the convergence of particle swarm optimization based on probability theory and proved the algorithm to be convergent with probability 1 under certain condition; the authors also proposed a new variant to meet the convergence probability. In 2018, (Xu and Yu 2018) employed martingale theory to analyze the convergence of particle swarm optimization (PSO) where the basic PSO is treated as a Markov chain and the Markov properties of PSO are analyzed; then, the evolutionary sequence of particle swarm with the best fitness value is found and transformed into a super martingale; and then, the convergence of standard PSO is discussed in terms of the super martingale convergence theorem (Brookes 1954). The authors of (Jeyakumar and Shanmugavelayutham 2011) presented an empirical study on convergence nature of differential evolution (DE) variants and validated the competitiveness of the variants by analyzing their convergence behavior, by measuring their convergence speed and quality measure. Many variants of DE are present in the literature, but a very few of them guaranteed global convergence theoretically. (Hu et al. 2014) developed a convergent DE algorithm in theory, which employs a selfadaptation scheme for the parameters and two operators, that is, uniform mutation and hidden adaptation selection operators. The parameter self-adaptation and uniform mutation operator enhance the diversity of populations and guarantee ergodicity. In (Qiuqiao et al. 2020), authors proposed a quantized potential well model based on the quantum behavior of chicken swarm optimization (CSO) algorithm and studied its convergence by the help of global convergence criterion, the improved algorithm was found to be globally convergent. In 2016, (Wu et al. 2016) studied the convergence of the CSO algorithm by utilizing the global convergence criterion (Solis and Wets 1981), and the CSO was found to be globally convergent.
In recent years, too much novel metaheuristic algorithms are developed, and some of them are the modified versions of the existing evolutionary algorithms (Sorensen 2015;Del Ser et al. 2019). In 2019, (Arora and Singh 2019) proposed a so-called novel metaheuristic algorithm, namely butterfly optimization algorithm (BOA) based on the food foraging and mating pair search behavior of butterflies. The novelty found in BOA is that the authors utilized Steven's power law from psychophysics, which is an empirical relationship between an increased intensity/ strength in a physical stimulus and the perceived magnitude increase in the sensation created by any stimulus. In this paper, we have tried to analyze the stochastic nature of BOA by studying the convergence behavior of the algorithm through Markov chain theory.
The BOA is a highly powerful and versatile algorithm for solving complex real-world problems with relatively complex search areas (Sharma et al. 2020). In BOA, it is assumed that all butterflies emit an aroma of varying strength, and that the aroma of each butterfly is linked to the location of the search agents. The aroma produced by a single butterfly is circulated throughout the entire search region, reaching all butterflies and being detected by each and every butterfly, forming strong social information network system in the search space. BOA has two phases in the implementation phase: global phase and local phase. When a butterfly detects the aroma of the best butterfly in the group, it proceeds to the best butterfly in the search space, which is known as the global phase. When a butterfly, on the other hand, does not receive the fragrance from other butterflies, it travels arbitrarily in the search domain, following random paths and this is known as the local phase. The BOA has become a popular algorithm within very short period. Some of the works on BOA are listed below: In 2020, (Zhi et al. 2020) suggested an improved version of BOA which was utilized to obtain the optimal design of a combined cooling, heating and power generation system. The efficiency of the system was analyzed on the basis of on annual cost, energy and energy efficiencies, and decrease in pollutant discharge. To maximize the potential, (Arora and Anand 2018) embedded learning automata in BOA in order to create a new variant of BOA in which the learning automaton's role is to configure the behavior of a butterfly in order to achieve an appropriate trade-off between global and local search. Using the mutualism phase of symbiotic organisms search (SOS) in BOA, (Sharma and Saha 2020) introduced a modified version of BOA (namely, m-MBOA) by adding the mutualism strategy of SOS algorithm for enhancing the exploitation capability of BOA and thereby balancing its exploratory and exploitive characteristics. In (Fathy 2020), the authors proposed a novel methodology incorporating BOA to reconfigure the shaded photovoltaic array optimally and extract the global maximum power. Sharma and Saha (2021) put forward a modified BOA (MPBOA) with the aid of the mutualism and parasitism strategies of SOS for augmenting the exploitative competence of BOA to make a proper balance between the search mechanisms. Further, they have used the algorithm to the problem of image segmentation via multilevel thresholding approach. In (Guo et al. 2021), an improved BOA was proposed which incorporated guiding weights and a population restart strategy. Addition of guiding weight into the global search phase, the algorithm's convergence rate and precision were increased, and the population restart strategy increases the possibility of jumping out of the local optimal solution. Zhang et al. (2020) attempted to improve the feature selection capabilities of the BOA by employing innovative initialization and DE algorithm and applied to decrease the BOA's system's unpredictability. In 2022, (Sharma et al. 2020) embedded the SOS algorithm in the structure of BOA to suggest an hybrid variant of BOA (hBOASOS) to enhance the solution accuracy and the convergence speed of the algorithm. Then, they employed the suggested algorithm to optimize the weight and cost of the retaining wall. In (Malisetti and Pamula 2020) proposed a CH selection protocol that was based on the quasi-oppositional BOA. The proposed method was compared with original BOA along with certain existing algorithms with respect to the networks' lifetime and efficiency of energy. The evidence of bias of BOA was demonstrated for the problems whose optimal value was near origin, and an unbiased BOA was suggested to eliminate this problem. For shifted functions, UBOA was proved to be a promising algorithm. In (Sharma et al. 2022a, b, c, d), the authors proposed a modified variant of BOA (mLBOA), where they have incorporated the Levy flight mechanism to enhance the explorations and Lagrange interpolation strategy to improve the exploitation ability. The study (Ustun 2020) aimed to suggest an improved BOA concerning the convergence as well as accuracy in the performance and applied it to solve the inverse synthetic aperture radar image motion compensation. A new hybrid BOA algorithm (m-SCBOA) with the aid of sine-cosine algorithm was proposed by Sharma et al. (Sharma et al. 2022a, b, c, d), which was utilized to solve an apriori multi-objective problem, known as parameter optimization of Al-4.5%Cu-TiC metal matrix composite. In (Tubishat et al. 2020), the authors proposed a dynamic BOA for feature selection problems. Two significant modifications were introduced to eliminate the weaknesses of BOA: Firstly, a local search strategy based on mutation operator is applied to evade local optima and secondly, the utilization of the mutation operator to increase the diversity. In (Sharma et al. 2022a, b, c, d), the authors first improved the basic BOA and then extended it to a multi-objective variant of BOA, viz. MONSBOA. The performance of the proposed algorithm was tested with the help of various benchmark and real-world problems.
Most of the work present on BOA primarily tries to improve the existing algorithm either by making a change in the algorithm or by hybridizing components with almost none trying to analyze the butterfly optimization model mathematically or theoretically.
In this paper, we establish the Markov chain model of the traditional BOA, the state transition sequence of the butterfly population is defined, and detailed analysis of the Markov chain's properties has been performed. We have shown mathematically that the population sequence generated by the butterfly optimization algorithm is a finite homogenous Markov chain. Using the Markov chain model of BOA, we perform the convergence analysis of the algorithm. Convergence criteria of the algorithm have been established using the convergence rule for random search algorithm (Solis and Wets 1981).The global convergence theorem for a random search algorithm requires two subtle conditions to be satisfied which are discussed later in the paper. BOA was found to satisfy the conditions, and by the global convergence theorem, we guaranteed the global convergence of the algorithm. BOA on this basis may be defined as an efficient and robust optimization technique both practically and theoretically. In short, whole purpose of the paper is the mathematical analysis of butterfly optimization algorithm and to prove its convergence.
The rest of the paper is organized as follows: In Sect. 2, a brief introduction to butterfly optimization algorithm is given followed by some preliminary concepts in Sect. 3. The Markov chain model of BOA is presented in Sect. 4 followed by the convergence analysis in Sect. 5. Sect. 6 analyzes the convergence in BOA practically. Finally, a brief conclusion is put up in Sect. 7.

Overview of butterfly optimization algorithm
In 2019, (Arora and Singh 2019) developed a new population-based meta-heuristics algorithm, namely butterfly optimization algorithm (BOA), based on the food foraging and mating pair searching behavior of the butterflies. In BOA, it is assumed that all butterflies emit an aroma of varying strength, and that the aroma of each butterfly is linked to the location of the search agents. The aroma produced by a single butterfly is circulated throughout the entire search region, reaching all butterflies and being detected by each and every butterfly, forming strong social information network system in the search space. The BOA has two phases in the implementation phase: global phase and local phase. When a butterfly detects the aroma of the best butterfly in the group, it proceeds to the best butterfly in the search space, which is known as the global phase. When a butterfly, on the other hand, does not receive the fragrance from other butterflies, it travels arbitrarily in the search domain, following random paths and this is known as local phase. The habitual occurrence of butterflies generally depends on two fundamental concepts, viz. variation of concentration of the stimulus (IÞ, which is actually linked with the fitness of the butterflies and formulation of fragrance (l), which is relative and sensed by other butterflies. In BOA, the fragrance is represented as a function of the substantial concentration of stimulus of the butterflies which is mathematically given by: where l i is the amount of aroma originated by ith butterfly, c is the sensory modality, I is the strength of the stimulus, and a is called power exponent. According to Steven's power law (Stevens, 1975), the use of sensory modality ðcÞ is to distinguish smell from other senses. In BOA, because the butterflies move toward the butterfly with the best fitness value, the value of f should increase faster than the value of 'I.' As a result, l should be allowed to vary with a level of amalgamation that the power exponent can achieve. In BOA, 'a' is taken as increased from 0.1 to 0.3 linearly over the course of iterations and 'c' is taken as 0.01.
The BOA's working process consists of three phases: initialization, iteration, and finalization. The algorithm generates an initial population at random using a uniform distribution after the values of the common parameters and algorithm-specific parameters are set. It then moves on to the iteration phase, where candidates from the population use two phases, namely the global phase and the local phase. BOA considers a switch probability p ð Þ to carry out its search process, which controls the strategy of the algorithm between global search and local search. In BOA, p is taken 0.8. The global as well as local phases of BOA are mathematically represented by Eqs. (2) and (3): where x i is the solution vector x i for ith butterfly in iteration number t. Here, g Ã represents the current best solution found among all the solutions in current iteration. Fragrance of ith butterfly is represented by l i and r is a random number in ½0; 1: where x j and x k t ð Þ are the position of the jth and kth butterfly at tth iteration in the solution space, and r is a random number in ½0; 1.

Preliminaries
In this section, we introduce the prerequisite required for the later discussion for better understanding.
Definition 1 (Stochastic processes) (Ross 2008): A stochastic process is a collection of random variables represented as X ¼ X t ð Þ : t 2 T f g, i.e., for each t in the index set T, X t ð Þ is a random variable. Generally, t is interpreted as time and X t ð Þ, the state of the process at time t.
Definition 2 (Discrete-time stochastic process and Continuous-time stochastic process) (Ross 2008 gis a discrete time stochastic process if the index set T is countable and if T is a continuum we define it as a continuous time stochastic process. Definition 3 (Markov Process and Markov chain) (Ross 2008): Markov processes represent the simplest generalization of independent processes by permitting the outcomes at any instant to depend only on the outcome that precedes it and none before that. Thus, in a Markov process X t ð Þ, the past has no influence on the future if the present is specified. If XðtÞ ¼ i, then we narrate the system to be in state I at time t. Now we suppose that whenever the system is at time t there is a fixed probability p that it will next be in state j. That we suppose that.
For all states i 0 ,i 1 ,…,i nÀ1 , i, j and all t C 0. The value p ij represents the probability that the process will, when in state i, next make a transition into state j. Such a stochastic process is called Markov chain. A special kind of Markov process is a Markov chain which can be discrete time or continuous time stochastic process depending on the index set T.
The outcomes of a Markov chain are called states of the Markov chain, and the possible values of X are termed as the state space of the Markov chain or in a general case the state space of the stochastic system.
Definition 4 (Finite Markov chain) (Ross 2008): If the state space is finite we call it a finite Markov process.
Definition 5 (Homogenous Markov chain) (Ross 2008 where p ij represents the probability that the process will possess (when in state i, next make a transition into state j) is independent of time t then it is called a homogenous Markov chain.

Markov chain model of the butterfly optimization algorithm
Butterfly optimization algorithm is in nature a swarmbased stochastic optimization algorithm; the motion of particle swarm can be described by a Markov chain based on the stochastic process.
For the sake of simplicity, we restrict ourselves to dimensionless analysis. The simplified position update formulas are as follows: I) The position update formula for the butterflies in the global phase is where x i is the solution vector x i for ith butterfly in iteration number t. Here, g Ã represents the current best solution found among all the solutions in current iteration. Fragrance of ith butterfly is represented by l i and r is a random number in ½0; 1: II) The position update formula for the butterflies in the local phase is where x j and x k t ð Þ are the position of the jth and kth butterfly at tth iteration in the solution space, and r is a random number in ½0; 1.
where N is the number of individuals in the population. The state space of the population is represented by the set.
i.e., consists of all the states of the butterflies which fall in the feasible region.
Definition 8 (State equivalence): Let us define a function Xðs; xÞ ¼ P N vjxj x i ð Þ, where s S, xs,v s is the characteristic function of event s and Xðs; xÞ are the individuals in the population state s contained in the state x. Two population states s 1 and s 2 are termed equivalent if Xðs 1 ; xÞ ¼ Xðs 2 ; xÞ which we denote as s 1 $ s 2 .
1) Equivalence class: The induced equivalence relation $ on S gives rise to some equivalent classes, and from the fundamental theorem of equivalence relation, we derive the following properties: • Let L be an equivalence class then any two population states contained in L are equivalent to each other. • For any two equivalence classes L 1 and L 2 , respectively, no population states of L 1 are equivalent to a population state of L 2 vice versa. • All the equivalence classes are disjoint, i.e., L 1 \L 2 ¼ £.
Definition 9 (State transition): In butterfly optimization algorithm for x i s,x j s the one step state transition x i ! x j is denoted by Q s x i ð Þ ¼ x j : The transition probability of Now, the butterfly population is a set of points in hyperspace, so the butterfly location update process is a transition between a set of points in hyper-space. According to the definition of state transition and the geometry nature of the butterfly optimization, we can get one-step transition probability of One-step transition probability in the local phase of x i ! x j is given by Convergence analysis of butterfly optimization algorithm 7249 Definition 10 (State transition probability of butterfly optimization algorithm): The one step transition probability from the state s i ! s j is given as.
where T S s i ð Þ ¼ s j represents the one step transition s i ! s j and N is the total number of individuals in the population.
Theorem 1 The population state sequence fsðtÞ; t ! 0g is a finite homogenous Markov chain.
Proof According to the defined state transition probability of BOA we know that for the population state sequence s t ð Þ; t ! 0; the transition probabilities P T S ðsðtÞÞ ¼ sðtþ ð 1Þ:Þ where s t ð Þ; sðt þ 1ÞS depends on the transition probabilities P T S ðxðtÞÞ ¼ xðt þ 1Þ ð Þ of all the butterflies in the population. Now as we proved in the definition of state transition of butterfly that P T S ðxðtÞÞ ¼ xðt þ 1Þ ð Þ of each individual is only related to the state x t ð Þ at time t and not to time t. Easily we can see the population sequence satisfying the definition of a Markov chain. We assume that all the search spaces of the BOA are finite so the state space S is finite. Hence,fsðtÞ; t ! 0g is a finite Markov chain. Now, it remains for us to prove the homogeneity of the Markov chain which can be easily deduced by the fact that P T S ðxðtÞÞ ¼ xðt þ 1Þ ð Þ is only related to the state xðtÞ at time t and not to time t.So, the population sequence fsðtÞ; t ! 0g generated by the BOA is a finite homogenous Markov chain.

Convergence analysis
In this section, we introduce the criteria for convergence for BOA and try to guarantee global convergence in BOA.

Criteria for convergence
The convergence behavior of the random search algorithm butterfly optimization algorithm is studied with the help of the convergence rule of a random search algorithm as suggested by (Solis and Wets 1981). For the optimization problemhs; f i, there exists a stochastic optimization algo-rithmD, the result of kth iteration is x k , and the next iteration value isx kþ1 ¼ D x k ; f ð Þ, where s feasible solution space is, f is the fitting function, and f is the iteration solution searched already for the butterfly optimization algorithm. For the random search algorithm, the convergence simply means finding a monotonic sequence which converges to the infimum of f on s (Solis and Wets 1981). Under the circumstances of inf ðf Þ is a point of singularity, there lies no hope of finding the minimum of f (Solis and Wets 1981). This leads to the introduction of essential infimum (i) defined as follows: Optimal region is defined as where 2 [ 0 and M\0. If the random algorithm D finds a point located in R e;M , the algorithm is thought to be approximate global optimal or acceptable global optimal (Solis and Wets 1981).
Theorem 2 (Global convergence theorem) (Solis and Wets 1981): Suppose that f is a measurable function, s is a measurable subset of R n and f satisfies the following conditions: and x k È É 1 k¼0 be a sequence generated by the algorithm. Then where P x k 2 R 2;M Â Ã is the probability that at step k, the point x k generated by the algorithm is in R 2;M .

Global convergence of BOA
The defined theorem on global convergence establishes the convergence rule to be put up for a random search algorithm which we now use for butterfly optimization algorithm. For the convergence rule to meet for butterfly optimization algorithm we try to prove that both the defined conditions A1 and A2 are satisfied.
Theorem 3 The BOA satisfies the condition A1.
Proof According to condition A1 ð Þf D x; n ð Þ ð Þ f x ð Þ and if n 2 S; f D x; n ð Þ ð Þ f n ð Þ. For butterfly optimization algorithm the current best solution in the iteration is saved as This simply means that BOA satisfies the condition A1.
Definition 11 (Optimal state set): For the globally optimal solution g b for an optimization (or minimization) problem \s; f [ , the optimal state set is defined as Considering G & S since if G equals S then the set of solutions in the feasible space is also an optimal solution so optimization has no significance.
Definition 12 (Closed set) (Lawler, 2006): Let S be a general state space then a non-empty subset C of S is a closed set iff p ij ¼ 0 holds for 8iCand8j 6 2 C.
Definition 13 (Irreducible closed set) (Lawler, 2006): If any closed set C doesn't include any proper closed subset, then C is called irreducible or else reducible.
Theorem 4 The optimal state set G is a closed set of the state space S in butterfly optimization algorithm.
Proof We prove the optimal state set G closed in S by showing that the population sequence generated by the algorithm satisfies definition 12. Now, in butterfly optimization algorithm we know that in the position update strategy the best individual retention mechanism is used. Suppose the objective value of current best individual is fitter than the previous fitness of the best individual then the previous individual gets replaced. This easily explains the retention mechanism and conveys that as the iteration increases we get a better value than the former and not worse. Therefore, P x kþ1 6 2 G j x k G ð Þ ¼ 0 and thus the optimal state set G is closed following Definition 12.
Definition 14 (Optimal population state set): For the globally optimal solution g b for an optimization (or minimization) problem \s; f [ , the optimal population state set is defined as.
Theorem 5 The optimal population state set H is closed on the state space S in butterfly optimization algorithm.
Proof From definition 14, we have H ¼ q ¼ s 1 ; s 2 ; . . .; s n ð Þj9 s i G; 1 i n f g .
If q k ¼ s 1 ; s 2 ; . . .; s n ð Þ2H at k th iteration then 9s i 2 G.Now, since G is a closed set with probability 1 there must exist x kþ1 2 G in accordance with Theorem 4. The definition 14 also holds that the population state q kþ1 ¼ s 1 ; s 2 ; . . .; s n ð Þ2H at k þ 1 ð Þ th iteration. Thus, Pðq kþ1 6 2 Hjq k 2 HÞ ¼ 0. Therefore, by Definition 12, we can conclude that H is a closed set.
Theorem 6 There does not exist a non-empty closed set B in the BOA population state space S which satisfies B \ H ¼ £.
Proof Assuming that there exist a closed set B such that B \ H ¼ ; Thus for infinite number of iterations each item one step transition probability P T s x r cþj À Á À Á ¼ x r cþjþ1 of the expansion (9) satisfies Eq. (5), namely P T s x r cþj À Á À Á ¼ x r cþjþ1 [ 0 , so does p l S j ;S i : Therefore, B is not a non-empty closed set, a contradiction. Hence, there is one and only the closed set H in the state space S.
Lemma 1 (Zhang and Liang 2003): Assuming that a Markov chain has a non-empty set C and there does not exist a non-empty closed set D, so that C \ D ¼ ;, then.
Theorem 7 When the number of iteration approaches infinity, the population state sequence will converge to the optimal state set H.
Proof From Theorems 5, 6 and Lemma 1, we can conclude the theorem to hold true.
Theorem 8 The population state set S is reducible.
Proof From Theorem 5, we know that the optimal population state set is a closed set and H S. Therefore, by definition, we conclude the population state set is reducible.

Convergence analysis of butterfly optimization algorithm 7251
Theorem 9 The butterfly algorithm has guaranteed global convergence.
Proof From Theorem 3, we have seen that BOA satisfying the condition A1 ð Þ as the current best solution in the iteration is saved always. From Theorem 7, we can say that the population state sequence will converge toward the optimal set after a sufficiently large number of iterations or infinity. Therefore, the probability of not finding the globally optimal solution is 0, which satisfies the condition A2 ð Þ. Hence, by the global convergence theorem, we conclude that the butterfly optimization algorithm has guaranteed global convergence.

Experimental analysis of global convergence in BOA
In this section, we have tried to perform an experimental analysis of convergence in BOA choosing some wide variety of benchmark problems based on varying properties.

Benchmark problems
In order to interpret the global convergence in BOA experimentally, a set of ten benchmark test problems (F1-F10) is selected from the literature. The problems include unimodal and multimodal functions with varying properties given in Table 1. Moreover, the functions are taken in such a way that some of the functions' global optimal values are at zero, while some others nonzero. It may be mentioned here that the function with nonzero optimal values is taken from IEEE CEC 2017 functions. The details of the benchmark functions are given in Table 1. It is implemented in MATLAB R2010a, and the experiments are carried out on a machine with 8 GB RAM under Windows 10 platform.

Analyzing global convergence in BOA
The major issue in metaheuristic algorithm is of being stuck in local optima which prevails mostly when the problem to optimize has various local nodes as discussed before. The theoretical analysis performed using the global convergence theorem confirmed BOA to have guaranteed global convergence. Now, we try to perform an experimental analysis choosing some wide variety of benchmark problems based on varying properties listed in Table 1.
The convergence graph is summarized in Fig. 1. From the convergence graphs of functions F2, F5, and F6, we can conclude that they all converge rapidly toward the global optimal values, while the global optimal value is at zero. However, the function that converges most rapidly is the Cigar function (F2). But the function F4 experiences some narrow valley until the search reaches a particular iteration the function F4 also proceeds in an exponential manner toward the point of minima, though the rate of convergence is still low as compared to other functions. On the other hand, from the convergence curve of the functions, F7, F9 and F10, it may be noticed that, though they converge faster but the value is not near the global optima. In the remaining cases, the BOA does not converge so rapidly, although it converges after certain iterations, which can be noticed from the convergence curves of those functions.
The theoretical analyses confirmed the global convergence, but no concluding statement was made about the rate of convergence. From the performed convergence analysis of BOA, we confirm that the algorithm will guarantee convergence, but the rate of convergence is still influenced by number of factors one being parameter tuning which can be easily seen by the performed experiment. The effect of parameter tuning will be shown in the later part of the manuscript. Hence, we conclude that the convergence analysis does not provide much information about the rate of convergence for a problem.

Comparison of convergence of BOA with some state-of-the-art algorithms
The BOA's convergence is compared with some of the state-of-the-art algorithms to assess the proficiency of BOA in convergence study. The same set of functions as given in Table 1 is considered for this study. The algorithms that are taken for the present analysis are DE, PSO, SCA, whale optimization algorithm (WOA) and moth-flame optimization algorithm (MFO). The convergence graphs are demonstrated in Fig. 2. The figure suggests that in functions F1, F2, F3, F4, F5 and F7, the BOA converges faster than the compared algorithms, while in functions F6, F7, F9 and F10, the performance of BOA is very poor. In F9, the BOA performs the worst and in F6 and F10, the performance of BOA is found to be second worst. Analyzing the convergence curves, it can be found that, in most of the cases when the global optimal value is zero, then BOA converges faster to the global optima than other compared algorithms.

Performance evaluation of BOA on various set of parameters
In this section, the performance of BOA has been examined on different parameter settings. The BOA contains mainly two parameters, namely sensory modality ðcÞ and power exponent ðaÞ. In original BOA, these two parameters were taken as 0.01 and 0.1-0.3, respectively. Till now, there is hardly any study, where the performance of BOA has been examined for different settings of these parameters. In this study, we have considered six different parameter settings of the parameters sensory modality ðcÞ and power exponent ðaÞ and experimented over the functions demonstrated in Table 1 and the results are displayed in Table 2. For this experimentation, a population size of 50 and 1000 iterations has been taken as termination criteria. To avoid the stochastic uncertainty, each function was executed for 30 times and the mean and standard deviation (std) values were taken for performance analysis. From the analysis, it has been noticed that, in most of the cases, if the optimal value of the function is zero, then the BOA obtains the global optimal results in any parameter setting. On the other hand, if the optimal value of the function is nonzero, then the performance of the algorithms varies on different parameter settings. It has been noticed that the parameter setting at c = 0.01, a = 0.1-0.3 and c = 0.05, a = 0.1-0.6 provides the better results in two occasions each while c = 0.01, a = 0.1-0.6 provides better result in one occasion.

Conclusion
Butterfly optimization algorithm is a newly devised algorithm whose philosophy is strictly based on the food foraging and mating pair searching behavior of the butterflies. A number of improvements of the traditional butterfly optimization algorithm are now present in the literature, while the major issue of the algorithm was the gap between the theoretical and practical analysis which is a mostly the case with many other metaheuristic algorithm. Butterfly optimization algorithm was found to be a very efficient and robust optimization technique when tested on the particular set of benchmark problems chosen. Markov chain model of the butterfly optimization algorithm is established. The butterfly population state transition sequence is defined, and a detailed analysis of the Markov chain's properties is performed. Population sequence generated by the butterfly optimization algorithm is a finite homogenous Markov chain. The population state set is reducible. A convergence analysis using the Markov chain model of the algorithm was performed theoretically using the global convergence theorem for a random search algorithm proposed in (Solis and Wets 1981). The butterfly optimization algorithm was found to ensure global convergence. This somehow proves in a theoretical sense the efficiency of butterfly optimization algorithm. Then, we have used a set of six functions with diverse properties to show that the convergence analysis of BOA does not provide much information about the rate of convergence for a problem. The proposed work in some sense reduces the gap, while much such analysis could be performed for laying down the theoretical aspect of the algorithm.
Funding Not Applicable.
Data availability Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

Declarations
Conflicts of interest The authors declare that they have no conflict of interest.
Ethical approval This article does not contain any studies with human participants or animals performed by any of the authors.