Continuation Newton methods with deﬂation techniques and quasi-genetic evolution for global optimization problems

The global minimum point of an optimization problem is of interest in engineering ﬁelds and it is difﬁcult to be solved, especially for a nonconvex large-scale optimization problem. In this article, we consider a new memetic algorithm for this problem. That is to say, we use the continuation Newton method with the deﬂation technique to ﬁnd multiple stationary points of the objective function and use those found stationary points as the initial seeds of the evolutionary algorithm, other than the random initial seeds of the known evolutionary algorithms. Meanwhile, in order to retain the usability of the derivative-free method and the fast convergence of the gradient-based method, we use the automatic differentiation technique to compute the gradient and replace the Hessian matrix with its ﬁnite difference approximation. According to our numerical experiments, this new algorithm works well for unconstrained optimization problems and ﬁnds their global minima efﬁciently, in comparison to the other representative global optimization methods such as the multi-start methods (the built-in subroutine GlobalSearch.m of MATLAB R2021b, GLODS and VRBBO), the branch-and-bound method (Couenne, a state-of-the-art open-source solver for mixed integer nonlinear programming problems), and the derivative-free algorithms (CMA-ES and MCS).


Introduction
In this article, we are mainly concerned with the global minimum of the unconstrained optimization problem where f : ℜ n → ℜ is a differentiable function except for the finite points. When problem (1) is nonconvex and large-scale, it may have many local minimizers and it is a challenge to find its global minimum. For problem (1), there are many popular global optimization methods, such as the multi-start methods [11,20,26,43,73,85], the branch-and-bound methods [10,17,76,83], the genetic evolution algorithms [32,46,59,63] and their memetic algorithms [35,60,78,84].
For multi-start methods, they use the local optimization methods such as the trustregion methods [15,25,90] or the line search methods [70,81] to solve problem (1) and find its local minimum point x * , which satisfies the following first-order optimality condition Since the implementations of multi-start methods are simple and successful for many real-world problems such as the design of hybrid electric vehicles [28], molecular conformal optimization problems [4,8,45] and acoustic problems [66], many commercial global solvers are based on multi-starting points such as GlobalSearch.m of MATLAB R2021b [61,85]. However, for the multi-start method, a random starting point could easily lie in the basin of attraction of a previously found local minimum point [43,73,85]. Hence, the algorithm might find the same local minimum even if the searches are initiated from points those are far apart. Other multi-start methods based on the deterministic procedure may have too many search branches (the number of search regions is 2 n×l at the l-th level iteration, where n is the dimension of the problem) [20,37,68]. Hence, the computational complexity is very high when the dimension of the problem is large for these deterministic multi-start methods.
The evolutionary algorithms [32,65,78,79] are another class of popular methods due to their simplicity and attractive nature-inspired interpretations, and are used by a broad community of engineers and practitioners to solve real-world problems such as finding optimal locations for active control actuators [34], the optimization of acoustic absorber configuration [88], and musical instruments [14]. However, for the evolutionary algorithms, since there are not a guaranteed convergence and their computational time are very heavy for the large-scale problems, they are difficult to find the global minimum accurately.
In order to reduce the computational complexity of the multi-start methods and retain the advantage of the evolutionary algorithms escaping from the local minima, we use the continuation Newton methods [5,53,56] with the revised deflation technique [13] to find the multiple stationary points of the objective function as the initial seeds of the quasi-genetic algorithm, which are other than the random seeds of the known evolutionary algorithms [59,63,65]. The practical multi-start methods and the evolutionary algorithms are the derivative-free methods. Namely, the gradient of the objective function does not need to be provided by the user and the derivative-free methods [16] are regarded as the black-box methods [80].
In order to retain the usability of derivative-free methods and the fast convergence of the gradient-based method, we use the automatic differentiation techniques [9,31,67,86,87] with the reverse mode to compute the gradient of the objective function and replace the Hessian matrix with its finite difference approximation. In general, the time that it takes to calculate the gradient of the objective function by automatic differentiation with the reverse mode is about a constant multiple of the function cost [9,67,87]. A rigorous analysis of all time costs associated with the reverse method in general shows that the cost is always less than four times the cost of the function evaluations (pp. 83-86, [31]). Therefore, the computational complexity of the gradient by automatic differentiation with the reverse mode is far less than that of the gradient by the finite difference approximation or that of the analytical gradient, which is about n times the cost of the function evaluations.
Recently, Luo et al intensively investigate the continuation methods for linearly constrained optimization problems [52,57], the system of nonlinear equations [53,56], unconstrained optimization problems [55], the linear programming problem [54] and the linear complementarity problem [58]. Those continuation methods are only suitable to solve a solution of nonlinear equations. In this paper, we need to confront multi-solutions of nonlinear equations. Namely, in order to obtain the global minimum of the objective function, we need to find its stationary points as many as possible. Therefore, we revise the deflation technique [13] and combine it with the continuation method [53,56] to find multiple stationary points of the objective function.
The rest of this article is organized as follows. In the next section, we convert the nonlinear system (2) into the continuation Newton flow. Then, we use a continuation Newton method [53,56] with the trust-region updating strategy [15,81] to follow the continuation Newton flow and find a stationary point of the objective function. In Section 3, we revise the deflation technique [13] to construct a new nonlinear function G k (·) such that the zeros of G k (x) are the stationary points of the objective function and exclude the known stationary points x * 1 , . . . , x * k of the objective function. In Section 4, we give a quasi-genetic algorithm to find a suboptimal point x * ag by using the smallest L points x * 1 , . . . , x * L of all found stationary points x * 1 , . . . , x * nsp as the evolutionary seeds. After we obtain the suboptimal point x * ag of the quasi-genetic algorithm, we refine it by using it as the initial point of the continuation Newton method [53,56] to solve the system (2) of nonlinear equations and obtain a better approximation of the global minimum point. In Section 5, some promising numerical results of the proposed algorithm are reported, with comparison to other representative global optimization methods such as the multi-start methods (the subroutine GlobalSearch.m of MATLAB R2021b [61], GLODS: Global and Local Optimization using Direct Search [20], and VRBBO: Vienna Randomized Black Box Optimization [43]), the branch-and-bound method (a state-of-the-art open-source solver (Couenne) [10,17]), and the derivative-free algorithms (the covariance matrix adaptation evolution strategy (CMA-ES) [32,33] and the multilevel coordinate search (MCS) [37,68]). Finally, some discussions and conclusions are also given in Section 6. · denotes the Euclidean vector norm or its induced matrix norm throughout the paper.

Finding a stationary point with the continuation Newton method
For convenience, we restate the continuation Newton method with the trust-region updating strategy [53,56], which is used to find a stationary point of f (x), i.e. a root of nonlinear equations (2). We consider the damped Newton method [41,42,71,81] for solving a root of nonlinear equations (2) as follows: where g(x) is the gradient of f (x) and H(x) is its Hessian matrix.
If we regard x k+1 = x(t k +α k ), x k = x(t k ) and let α k → 0, we obtain the following continuous Newton flow [7,12,21,53,82]: Actually, if we apply an iteration with the explicit Euler method [6,75] for the continuous Newton flow (4), we obtain the damped Newton method (3). Since the Hessian matrix H(x) may be singular, we reformulate the continuous Newton flow (4) as the following general formula: The continuous Newton flow (5) is an old method and can be backtracked to Davidenko's work [21] in 1953. After that, it was investigated by Branin [12], Deuflhard et al [24], Tanabe [82] and Abbott [1] in 1970s, and applied to nonlinear boundary problems by Axelsson and Sysala [7] recently. The continuous and even growing interest in this method partially due to its global convergence as stated by the following Proposition 2.1.
Proposition 2.1 (Branin [12] and Tanabe [82]) Assume that x(t) is a solution of the continuous Newton flow (5). Then, the energy function E(x(t)) = g(x) 2 converges to zero when t → ∞. That is to say, for every limit point x * of x(t), it is also a solution of nonlinear equations (2). Furthermore, every element g i (x) (i = 1, 2, . . . , n) of g(x) has the same linear convergence rate and x(t) can not converge to the solution x * of nonlinear equations (2) on the finite interval when the initial point x 0 is not a solution of nonlinear equations (2).
Proof. For the proof of the first part i.e. the convergence lim t→∞ g(x) = 0, one can refer to references [12,82]. The proof of the second part can be found in references [53,56].
Since the stiff concept of an ordinary differential equation (ODE) is a very key role to understand the numerical method integrating an ODE trajectory, we give its definition in Definition 2.1 for convenience.

Remark 2.1
The inverse H(x) −1 of the Hessian matrix H(x) can be regarded as the pre-conditioner of g(x) such that every element x i (t) (i = 1, 2, . . . , n) of the solution x(t) of the continuous Newton flow (4) has the roughly same convergence rate, and it mitigates the stiffness of the ODE (4). This property is very useful since it makes us adopt the explicit ODE method to follow the trajectory of the Newton flow.
Actually, if we consider g(x) = Ax, where A is an n×n nonsingular and symmetric matrix, from the ODE (5), we have From Definition 2.1, we know that the ODE (6) is a non-stiff ODE. By integrating the linear ODE (6), we obtain From equation (7), we know that every element x i (t) (i = 1, 2, . . . , n) of the solution x(t) converges to zero linearly with the same rate when t tends to infinity.
When the Hessian matrix H(x) is singular or nearly singular, the ODE (5) is the system of differential-algebraic equations (DAEs), and its trajectory can not be efficiently followed by the general ODE methods [6,38,39] such as the backward differentiation formulas (the built-in subroutine ode15s.m of MATLAB R2021b [61,75]). Therefore, we need to construct the special method to handle this problem.
We expect that the new method has the global convergence as the homotopy continuation methods, and the fast convergence rate near the stationary point x * as the merit-function-based methods. In order to achieve these two aims, we construct the special continuation Newton method with the new step α k = ∆t k /(1 + ∆t k ) [53][54][55][56][57][58] and the time step ∆t k is adaptively adjusted by the trust-region updating strategy [15,89,90] to follow the trajectory of the ODE (5).
Firstly, we apply the implicit Euler method [75] to the continuous Newton flow (5). Then, we obtain The scheme (8) is an implicit formula and it needs to solve a system of nonlinear equations at every iteration. To avoid solving the system of nonlinear equations, we replace the Hessian matrix H(x k+1 ) with H(x k ), and substitute g(x k+1 ) with its linear approximation g(x k )+H(x k )(x k+1 −x k ) in equation (8), respectively. Thus, we obtain the following explicit continuation Newton method: where H k equals H(x k ) or its approximation.
The continuation Newton method (9) is similar to the damped Newton method (3) if we set α k = ∆t k /(1 + ∆t k ) in equation (9). However, from the view of the ODE method, they are different. The damped Newton method (3) is obtained by the explicit Euler method applied to the continuous Newton flow (5), and its time step α k is restricted by the numerical stability [38]. That is to say, for the linear test function g(x) = Ax, from equation (6), we know that its time step α k is restricted by the stable region |1−α k | ≤ 1. Therefore, the large time step α k can not be adopted in the steadystate phase.
The continuation Newton method (9) is obtained by the implicit Euler method and its linear approximation applied to the continuous Newton flow (5), and its time step ∆t k is not restricted by the numerical stability for the linear test equation (6). Therefore, the large time step can be adopted in the steady-state phase for the continuation Newton method (9), and it mimics the Newton method near the stationary point x * such that it has the fast local convergence rate.
The most of all, α k = ∆t k /(∆t k + 1) in equation (9) is favourable to adopt the trust-region updating strategy for adaptively adjusting the time step ∆t k such that the continuation Newton method (9) accurately follows the trajectory of the continuous Newton flow in the transient-state phase and achieves the fast convergence rate near the stationary point x * .
The main idea of the adaptive control step ∆t k based on the trust-region updating strategies [22,36,47,[49][50][51] can be described as follows. The time step ∆t k+1 will be enlarged when the linear model g(x k ) + H k s k approximates g(x k + s k ) well, and ∆t k+1 will be reduced when g(x k ) + H k s k approximates g(x k + s k ) badly. Thus, by using the relation (9), we enlarge or reduce the time step ∆t k+1 at every iteration according to the following ratio: When the gradient g(x) is very nonlinear, a decrease in the gradient is rarely found and ∆t k becomes too small and even zero before a minimizer is found. In such case, ρ k defined by equation (10) goes to -∞. To overcome this problem, ∆t k must be restricted from below by a tuning small positive parameter ∆t < 1. Thus, by combining it with equation (10), we give a following particular adjustment strategy: where the constants c 1 , c 2 , η 1 , η 2 satisfy the conditions 0 < c 1 < 1 ≤ c 2 and 0 < η 1 < η 2 < 1, and 0 < ∆t < 1 is a small threshold value. When ρ k ≥ η a , we accept the trial step and set where s k is solved by equation (9) and η a is a small positive number. Otherwise, we discard the trial step and set Remark 2.2 This time-stepping selection based on the trust-region updating strategy has some advantages compared to the traditional line search strategy [48]. If we use the line search strategy and the damped Newton method (3) to track the trajectory x(t) of the continuous Newton flow (5), in order to achieve the fast convergence rate in the steady-state phase, the time step α k of the damped Newton method is tried from 1 and reduced by the half with many times at every iteration. Since the linear model g(x k ) + H k s N k may not approximate g x k + s N k well in the transientstate phase, the time step α k will be small. Consequently, the line search strategy consumes the unnecessary trial steps in the transient-state phase. However, the timestepping selection method based on the trust-region updating strategy (10)- (11) can overcome this shortcoming.
For a system of nonlinear equations, the Hessian evaluation is heavy if we update the Hessian matrix H(x k ) at every iteration. In order to reduce the computational time of the Hessian evaluation, similarly to the adaptively updating Jacobian technique [55,56], we set H k+1 = H k when g(x k ) + H k s k approximates g(x k + s k ) well. Otherwise, we update H k+1 = H(x k+1 ). An effective updating strategy is give by where ρ k is defined by equation (10). Thus, when H k performs well, i.e. |1 − ρ k | ≤ η 1 , according to the updating formula (14), we set H k+1 = H k in equation (9).

12:
if Set

24:
Compute the ratio ρ k from equation (10). 25: Adjust the time step ∆t k+1 according to the trust-region updating strategy (11 analysis of the global convergence and the local suplinear convergence of Algorithm 1, please refer to references [53,56]. In order to understand Algorithm 1 better, we give some explanations of its operating mechanism and some key variables. In line 2, the parameters c 1 , c 2 , η 1 , η 2 , ∆t are used to adaptively adjust the time step ∆t k and they are defined by equation (11). The ratio ρ k defined by (10) represents the ratio of the actual reduction g(x k ) − g(x k + s k ) to the predicted reduction g(x k ) − g(x k ) + H k s k . The parameters η 1 , η 2 (0 < η 1 < 1 ≤ η 2 ) are used to define the range of the ratio ρ k and indicate that the linear model g(x k ) + H k s k approximates g(x k + s k ) well or not. If |1 − ρ k | < η 1 , it indicates that g(x k ) + H k s k approximates g(x k + s k ) well. Then, the time step ∆t k+1 is enlarged and it equals c 2 ∆t k . If |1 − ρ k | > η 2 , it indicates that g(x k ) + H k s k approximates g(x k + s k ) bad. Then, the time step ∆t k+1 is reduced and it equals c 1 ∆t k . If η 1 ≤ |1 − ρ k | ≤ η 2 , it indicates that g(x k ) + H k s k approximates g(x k + s k ) neither well nor bad. Then, the time step ∆t k+1 is kept with ∆t k . The parameter 0 < ∆t < 1 is a small number and used to prevent the time step ∆t k from becoming too small and even zero, when the gradient g(x) is very nonlinear and ∆t k is adaptively adjusted by the scheme (11). 0 < ∆t init ≤ 1 represents the initial time step and its default value is ∆t init = 10 −2 . The parameter maxit represents the maximum iterations.
In line 4, the Boolean variable success is used to indicate that the continuation Newton method (Algorithm 1) finds a stationary point of f (x) successfully or not. When Algorithm 1 finds a stationary point of f (x) successfully, we let the Boolean variable success be true. Otherwise, we let the Boolean variable success be false. In line 5, the Boolean variable success ap is used to indicate that the trial step s k is accepted or not. When the trial step s k is accepted, we let the Boolean variable success ap be true. Otherwise, we let the Boolean variable success ap be false. In line 6, it evaluates the gradient at the initial point x 0 . In line 7, it computes the infinite norm of the initial gradient. In lines 10-21, they give the procedures to solve the Newton step s N k . In line 9, we use the infinite norm Res k = g k ∞ < ε as the loop termination condition, since it can provide more accurate assessment to the status of each element |g i k | < ε (i = 1, 2, . . . , n) than the 2-norm g k < ε.
In line 22, it computes the trial step s k and the trial point x k+1 . In line 23, it evaluates the new gradient g(x k+1 ). In lines 24, it computes the ratio ρ k according to equation (10). In line 25, it adaptively adjusts the time step ∆t k+1 according to the trust-region updating strategy (11). In lines 26-32, they give the criterion whether the trial step s k is accepted or not according to the ratio ρ k ≥ η a or not. In lines 35-39, they give a criterion how to assign the Boolean variable success. When the final residual Res k = g k ∞ is less than the given tolerance, we regard that Algorithm 1 finds a stationary point of the objective function successfully and let the Boolean variable success be true. Otherwise, we let the Boolean variable success be false. In lines 40-41, they output the found stationary point x * = x k and the Boolean variable success.

The deflation technique
By using Algorithm 1 (CNMTr) in Section 2, we can find a stationary point x * 1 of f (x). Our strategy is that we repeatedly use Algorithm 1 to find the stationary points of f (x) as many as possible. Then, we use those found stationary points as the initial evolutionary seeds of the genetic algorithm in Section 4. If we directly use Algorithm 1 with the multi-start method, a starting point could easily lie in the basin of attraction of a previously found local minimum and algorithm 1 may find the same local minimum even if the searches are initiated from points those are far apart [20,43,73,85]. In order to overcome this disadvantage, we revise the deflation technique [13] and combine it with Algorithm 1 to find stationary points of f (x) as many as possible.
Assume that x * k is a zero point of the function G k−1 (x). Then, we construct another function G k (·) via eliminating the zero point x * k of G k−1 (x) as follows: where G 0 (x) = g(x) = ∇ f (x). Thus, from equation (15), we obtain where x * i (i = 1, . . . , k) are the zero points of g(x). From equation (16), we know that the zero point x * k+1 of G k (x) is also the zero point of g(x). Namely, the zero set of G k (x) is a zero subset of g(x).
When g(x * i ) = 0 and the Hessian matrix H(x * i ) is nonsingular, Brown and Gearhard [13] have proved is defined by equation (16). Actually, we can obtain the following estimation of the lower bound of G k (x) when x belongs to the neighborhood of x * i .
Lemma 3.1 Assume that the gradient g(x) of the objective function is continuously differential and x * i is a zero point of the gradient g(·). Furthermore, the Hessian matrix H(x * i ) satisfies where c l is a positive constant. Then, there exists a neighborhood Proof. Since x * i is a zero point of g(x), according to the first-order Taylor expansion, we have According to the nonsingular assumption (17) of H(x * i ) and the continuity of Furthermore, from the triangle inequality x − y ≥ x − y and the assumption (17), we have . (22) Thus, from equations (21)- (22), we obtain By substituting inequality (23) into equation (16), we obtain That is to say, x * i (i = 1, 2, . . . , k) is not a zero point of G k (x). In practice, we need to avoid the overflow or the underflow when we compute G k (x), and we expect that the continuation Newton method (Algorithm 1) finds stationary points of f (x) as many as possible. According to our numerical experiments, Algorithm 1 can find more stationary points with x − x * i 1 (i = 1, . . . , k) as the deflation items than x − x * i (i = 1, . . . , k) in equation (16). We also notice that Thus, Algorithm 1 is apt to find the spurious root of G k when the dimension n is large. Therefore, in order to improve the effect of the deflation technique, we modify G k (x) defined by equation (15) as where x * i (i = 1, 2, . . . , k) are the known roots of g(x) and α i (i = 1, . . . , k) are defined by Since G k (x) has the special structure and its sub-differential exists except for finite points x * 1 , . . . , x * k , from equations (24)-(25), we compute its sub-differential ∂ G k (x) as follows: where p k (x) is defined by Here, the sign function sgn(·) is defined by Now, by using the revised deflation technique (24) of g(x) and the continuous Newton method (Algorithm 1), we can find a new stationary point of f (x) other than the known stationary points. We describe their detailed implementations in Algorithm 2.
In order to understand Algorithm 2 better, we give some explanations of its operating mechanism and its key variables. K is the number of the known stationary points of f (x). In lines 1-2, they give the default parameters used by the continuation Newton method to find a new stationary point of f (x) and they are explained in Algorithm 1. In line 3, the Boolean variable success dt is used to indicate whether the continuation Newton method finds a new stationary point successfully or not. When the continuation Newton finds a new stationary point successfully, we let the Boolean variable success dt be true. Otherwise, we let Boolean variable success dt be false.
In lines 4-30, they give the pseudo codes of the continuation Newton method (CNMTr) to find a new stationary point of f (x). They are equivalent to Algorithm 1 if we replace g(x) and H(x) with G K (x) and ∂ G K (x) in Algorithm 1, respectively. For their explanations, please refer to the related explanations in Algorithm 1. In lines 31-38, they output different values according to whether a new stationary point is found successfully or not. When Res k ≤ ε, it indicates that the continuation Newton method finds a new stationary point successfully and we let the Boolean variable success dt Algorithm 2 Continuation Newton methods with the deflation technique for finding a new stationary point of the objective function (CNMDT) , an initial point x 0 (optional), the tolerance ε (optional), the Hessian matrix H(x) of f (x) (optional). Output: K + 1 found stationary points x * 1 , . . . , x * K+1 including the newly found stationary point x * K+1 of f (x), the Boolean variable success dt. 1: Set the default x 0 (i) = 1 (i = 1, 2, . . . , n) and ε = 10 −6 when x 0 or ε is not provided. 2: Initialize the parameters: η a = 10 −6 , η 1 = 0.25, if success ap is true then 11: itc = itc + 1.

22:
if ρ k ≥ η a then 23: Accept the trial point x k+1 . Compute the residual Res k+1 = F k+1 ∞ . 24: Let the Boolean variable success ap be true. 25: Let the Boolean variable success ap be false.

37:
Output K known stationary points x * 1 , . . . , x * K and the Boolean variable success dt. 38: end if be true. Then, this newly found stationary point is saved as x * K+1 . They output K + 1 found stationary points x * 1 , . . . , x * K+1 and the Boolean variable success dt. Otherwise, We let the Boolean variable success dt be false. Then, they output K known stationary points x * 1 , . . . , x * K and the Boolean variable success dt.
Remark 3.1 From equation (24), we know that ∇ f (x l ) may converge to a positive number when lim l→∞ x l − x * i 1 = ∞, where x * i is the stationary point of f (x). One of the reasons is that the Newton method fails to converge to the certain stationary point of f (x) when the initial point x 0 belongs to a special region [1]. Thus, the continuation Newton method with the deflation technique may not find all stationary points of f (x) from the same initial point x 0 .
In order to find stationary points of f (x) as many as possible, we use the continuation Newton method with the deflation technique (i.e. Algorithm 2) to find the stationary points from several far apart points sequentially. In practice, We define the n/2 × 1 vector e = (1, . . . , 1) T and select the following six points x init i (i = 1, . . . , 6) as the default initial points: where x init j (i) represents the i-th element of the n-dimensional vector x init j . Thus, we repeatedly use Algorithm 2 from multi-start points to find stationary points of f (x) as many as possible. Their detailed implementations are described in Algorithm 3.
In order to understand Algorithm 3 better, we give some explanations of its operating mechanism and its key variables. In line 1, x init 1 , . . . , x init L are L default initial points defined by equation (29), where L = 6. In line 3, the Boolean variable success represents whether a stationary point of f (x) is found by CNMTr (Algorithm 1) successfully or not. When Algorithm 1 (CNMTr) finds a stationary point successfully, we let the Boolean variable success be true. Otherwise, we let the Boolean variable success be false. In line 4, nsp represents the number of found stationary points and its default value equals 0.
In lines 5-15, they repeatedly call Algorithm 1 (CNMTr) to find a stationary point of f (x) from L default initial points x init 1 , . . . , x init L . Once Algorithm 1 finds a stationary point, we let the Boolean variable success be true, save this found stationary point as x * 1 , set nsp = 1, and exit the loop. Otherwise, we let the Boolean variable success be false.
In lines 16-30, if Algorithm 1 finds a stationary point of f (x) successfully, they will repeatedly call Algorithm 2 (CNMDT) to find the new stationary point of f (x) from L default initial points x init 1 , . . . , x init L as many as possible. In line 17, the variable i represents the order of given initial points. In line 18, it means that we execute lines 19-28 if the order variable i is less than the number of given initial points (i.e. L). In line 19, it selects the i-th initial point x init i and calls Algorithm 2 (CNMDT) to find a new stationary point of f (x). In lines 20-28, they execute the different procedures according to whether a new stationary point is found successfully by Algorithm 2 (CNMDT) or not. If a new stationary point of f (x) is found Algorithm 2 (CNMDT) successfully, we save this newly found stationary point and let nsp (the number of found stationary points) increase by 1. Otherwise, we let the order variable i increase Algorithm 3 Continuation Newton methods with the deflation technique and multistart points for finding multi-stationary points (CNMDTM) All found stationary points x * 1 , . . . , x * nsp of f (x), and its global optimal approximation point x * ap . 1: Set the default tolerance ε = 10 −6 . 2: Set the default initial points x init i (i = 1, . . . , L) by equation (29), where L = 6. 3: Let the Boolean variable success be false. 4: Set nsp = 0 (nsp represents the number of found stationary points of f (x)). 5: for i = 1, . . . , L do 6: Select the i-th initial point as x 0 = x init i and call Algorithm 1 to find a stationary point of f (x). 7: if a stationary point of f (x) is found by Algorithm 1 successfully then 8: Let the Boolean variable success be true. 9: Save this found stationary point of f (x) as x * 1 . 10: Set nsp = Set f * ap = f (x * 1 ) and x * ap = x * 1 .

40:
Output all found stationary points x * 1 , . . . , x * nsp of f (x), its global optimal approximation point x * ap , its optimal approximation value f * ap , and the Boolean variable success. 41: else 42:

43:
Let the Boolean variable success be false.

44:
Output x * ap , f * ap and the Boolean variable success. 45: end if by 1 and select the next initial point. In lines 22-23, they mean that we need to select the next initial point if the newly found stationary point almost equals the i-th initial point x init i . In lines 31-45, if nsp (the number of found stationary points) is greater than 0, we let the Boolean variable success be true and output all found stationary points x * 1 , . . . , x * nsp , the optimal approximation point x * ap = arg min{ f (x * 1 ), . . . , f (x * nsp )}, and the Boolean variable success. Otherwise, we let Boolean variable success be false and output the optimal approximation point x * ap = x 0 , and the Boolean variable success.

The quasi-genetic evolution
By using the continuation Newton method with the deflation technique and multi-start points (Algorithm 3) in Section 3, we find the most of stationary points x * 1 , . . . , x * nsp and obtain the suboptimal value f (x * ap ). Algorithm 3 is essentially a local search method, and it can not ensure that f (x * ap ) is the global minimum of f (x) except that all stationary points of f (x) have been found by Algorithm 3. Therefore, in order to find the global minimum at all possible, we use the idea of memetic algorithms [65,78] to approach the global minimum further. Namely, we use the property that evolutionary algorithms escape from the local minima. And we revise the heuristic crossover of the genetic algorithm such that it evolves from the found stationary points x * 1 , . . . , x * nsp of f (x). After it evolves several generations such as twenty generations, we obtain an evolutionary suboptimal solution x * ag of f (x). Then, we use it as the initial point of the continuation Newton method (i.e. CNMTr, Algorithm 1) to refine it and obtain a better approximation solution x * min of the global minimum point. In order to ensure the effect of the evolutionary algorithm, the number of population (we denote it as L) can not be too small. In general, L is greater than 20. Since the number of found stationary points may be less than L, we supplement L points as the candidate seeds of the evolutionary algorithm.
In practice, we select the first L minimum points of the objective function from the setS and denote these L points as x 0 i (i = 1, . . . , L), where the set S seed 0 is defined by equation (32). We denote and adopt them as the initial seeds of the quasi-genetic algorithm.
Secondly, we select any two elements x 0 i and x 0 j from the seed set S 0 , and use the convex crossover to generate the next offspring. Namely, we set Similarly, we select the first L minimum points of f (x) from the set and denote these L points as x 1 i (i = 1, . . . , L). We set and regard them as the dominant population.
Repeatedly, after the l-th evolution, we generate the offspring x l+1 1 , . . . ,x l+1 L(L−1)/2 and select the first L minimum points of f (x) from the set We denote these L points as x l i (i = 1, . . . , L), and set After it evolves N generations, we stop the evolution and select the minimum point of f (x) from the set S N . We denote this minimum point as x * ag . Finally, in order to improve the convergence of the quasi-genetic evolution, we call Algorithm 1 (CNMTr) from x * ag to find the stationary point of f (x). We denote this found stationary point as x * cn . Then, we select the minimum point x * min of f (x) between x * cn and x * ag , i.e.
and output x * min as the global optimal approximation point of f (x). According to the above discussions, we give the detailed descriptions of the quasi-genetic algorithm in Algorithm 4.
In order to understand Algorithm 4 better, we describes how it works step by step and give some explanations of the key variables. nsp represents the number of the known stationary points of f (x). In line 2, L represents the number of population and its default value equals 21. It supplements L candidate seeds x init 1 , . . . , x init L from the Set k = 1. 8: for i = 1, . . . , L do 9: for j = (i + 1), . . . , do 10:

The implementation of CNMGE
In this section, according to the discussions of previous sections, we give the implementation description of the continuation Newton method with the deflation tech-nique and the quasi-genetic evolution for global optimization problems. We denote this method as CNMGE. Its implementation diagram is illustrated by Figure 1. Firstly, CNMGE calls the subroutine CNMDTM (Algorithm 3) to find the stationary points of f (x) from multi-start points as many as possible. Then, it calls QGE (Algorithm 4) and uses those found stationary points as the initial seeds of QGE to approach the global optimal approximation x * ag of f (x). Finally, it calls CNMTr (Algorithm 1) to refine x * ag and obtain its refined point x * cn . It outputs x * = arg min{ f (x * ag ), f (x * cn )} as the global optimal approximation point of f (x). From Figure 1 and Algorithm 3, we know that the subroutine CNMDTM firstly calls the subroutine CNMTr (Algorithm 1) to find a stationary point. Then, it repeatedly calls the subroutine CNMDT (Algorithm 2) to find a new stationary point of f (x) from multi-start points, and saves all found stationary points x * 1 , . . . , x * nsp as the input of the subroutine QGE.  In order to retain the usability of derivative-free methods, we use the automatic differentiation technique [9,31,67,86,87] with the reverse mode to compute the gradient ∇ f (x) of f (x) in Algorithm 1 (CNMTr) and Algorithm 2 (CNMDT). In general, the time that it takes to calculate the gradient of the objective function f (x) by automatic differentiation with the reverse mode is about a constant (less than four) multiple of the function cost [9,31,67,87]. We implement it via calling the built-in subroutine prob2struct.m of MATLAB 2021b [61]. At the end of this section, we give an example to show how it works.

CNMGE
In Algorithm 1 and Algorithm 2, in order to find a stationary point of f (x), it requires the Hessian matrix H(x) of f (x). In practice, we replace the Hessian matrix H(x) with its finite difference approximation as follows: where e i represents the unit vector whose elements equal zeros except for the i-th element which equals 1, and ε = 2 × 10 −8 .
It is worthwhile to discuss the selection of parameters in Algorithm 1 and Algorithm 2. In practice, we select the parameters η 1 = 1/4, η 2 = 3/4, c 1 = 1/2, c 2 = 2. According to our numerical experiments, those selected parameters work well. Since the continuation Newton flow defined by equation (5) varies dramatically in the transient phase, in order to follow its trajectory accurately, we choose the small initial time step ∆t init such as ∆t init = 10 −2 . This selection strategy of the initial time step is different to that of the line search method. The initial step length of the line search method is tried from α 0 = 1 such that it mimics the Newton method and achieves the fast convergence rate near the stationary point [70,81].
We write a software package to implement CNMGE with the MATLAB language and it can be executed in the MATLAB 2020b environment or after this version. For ease of use, we give an example to illustrate how to use it.

Rosenbrock function [74]
It is not difficult to know that the global minimum of the Rosenbrock function (41) is located at x * = (1, 1) and f (x * ) = 0. Its user guide is described in Example 5.1 and we give some explanations as follows. In line 4, the subroutine optimvar defines an n × 1 vector. In line 5, it defines an objective function such as f (x) = 100(x 2 − x 2 1 ) 2 + (x 1 − 1) 2 . In line 6, it defines the lower bound as −∞. In line 7, it defines the upper bound as ∞. In line 8, it defines a constraint x 2 1 ≤ 1 (if it does not define this constraint, the automatic differentiation function prob2struct will generate the incorrect gradient of the objective function for the nonlinear least-square problem). In line 9, the subroutine optimproblem assigns the objective function and the constraints of an optimization problem. In line 10, the subroutine prob2struct generates a standard MATLAB function generatedObjective including the objective function and its gradient of an optimization problem. Its gradient is computed by automatic differentiation with the reverse mode. In line 11, it calls the CNMGE solver to find an optimal approximation solution of the global optimization problem.

Numerical experiments
In this section, some numerical experiments are conducted to test the performance of the continuation Newton method with the deflation technique and the quasi-genetic evolution (CNMGE) for global optimization problems. In Subsection 6.1, we verify the performance improvement of the algorithm combination. Namely, we compare the continuation method with the multi-start method (CNMTrM, Algorithm 1 with the multi-start method), the continuation method with the revised deflation technique (CNMDTM, Algorithm 3), and the continuation method with the revised deflation technique and the evolutionary algorithm (CNMGE, the combination of Algorithm 3 and Algorithm 4) for 68 open test problems.
In Subsection 6.2, in order to test the total performance of CNMGE, we compare it with other representative global optimization methods such as the multi-start methods (the built-in subroutine GlobalSearch.m of MATLAB R2021b [61,85], GLODS: Global and Local Optimization using Direct Search [20], VRBBO: Vienna Randomized Black Box Optimization [43]), the branch-and-bound method (Couenne: Convex Over-and Under-ENvelopes for Nonlinear Estimation, one of the state-of-the-art open-source solver [10,17]), and the derivative-free algorithms (CMA-ES [32,33] and MCS [37,68]).

Verifying the effect of the algorithm combination
In this subsection, in order to verify the performance improvement of the algorithm combination, we compare the continuation method [53,56] with the multi-start method (CNMTrM, Algorithm 1 with the multi-start method), the continuation method with the revised deflation technique (CNMDTM, Algorithm 3), and the continuation method with the revised deflation technique and the evolutionary algorithm (CNMGE, the combination of Algorithm 3 and Algorithm 4) for 68 open test problems. The test problems can be found in [2,3,27,45,64,77] and their scales vary from dimension 1 to 1000. In order to evaluate the effect of the algorithm gradient computed by automatic differentiation, we also compare CNMGE using the algorithm gradient com-puted by automatic differentiation (we still denote it as CNMGE) with CNMGE using the analytical gradient (we denote it as CNMGE AG). The termination condition of Algorithm 1 for finding a stationary point of the objective function f (x) is The codes are executed by a HP notebook with the Intel quad-core CPU and 8Gb memory in the MATLAB2021b environment [61]. The numerical results are arranged in Tables 1-4 and Figures 2-3. Since the global minima of some test problems are unknown, we mark their global minima of those problems with the notation "\" in Tables 1-4. For those problems, we regard that the method fails to find their global minima if the found minima are greater than those of other methods.
From Tables 1-5, we find that CNMGE and CNMGE AG work well and obtain the global minima of all 68 test problems. CNMTrM fails to find the global minima of 27 test problems about 39.71%. And CNMDTM fails to find the global minima of 6 test problems about 8.82%. Therefore, the performance of the algorithm combination gradually becomes better and better from the continuation Newton method with the multi-start method (CNMTrM, Algorithm 1), CNMTr with the deflation technique (CNMDTM, Algorithm 3), and CNMDTM with the evolutionary algorithm (CNMGE, the combination of Algorithm 3 and Algorithm 4).
We also find that CNMGE will not sacrifice its performance if it replaces the analytical gradient with the algorithm gradient computed by the automatic differentiation technique with the reverse mode, and replaces the Hessian matrix ∇ 2 f (x) with its difference approximation (40). We find that CNMGE and CNMGE AG almost take the same time to compute a problem from Figures 2-3. Consequently, CNMGE with the automatic differentiation technique retains the usability of the derivative-free method and has the fast convergence of the gradient-based method.

Testing the total performance of CNMGE
In this subsection, in order to evaluate the total performance of CNMGE, we compare it with other representative global optimization methods such as the multi-start algorithms (the built-in subroutine GlobalSearch.m of MATLAB R2021b [61,85], GLODS [20], and VRBBO [43]), the branch-and-bound method (Couenne, one of the state-of-the-art open-source solver [10,17]), and the derivative-free algorithms (CMA-ES [32,33] and MCS [37,68]). The codes are executed by a HP notebook with the Intel quad-core CPU and 8Gb memory except for Couenne, which is executed by the NEOS server [19,23,29,69].
GlobalSearch [61,85] is an efficient multi-start method for global optimization problems and it is widely used in engineering fields [61]. GLODS (Global and Local Optimization using Direct Search [20]) and VRBBO (Vienna Randomized Black Box       Optimization, its MATLAB code can be downloaded from [43]) are two other representative multi-start methods. Couenne (Convex Over-and Under-ENvelopes for Nonlinear Estimation) is a state-of-the-art open source solver for global optimization problems. The covariance matrix adaptation evolution strategy (CMA-ES, its MAT-LAB code can be downloaded from [18]) and the multilevel coordinate search (MCS, its MATLAB code can be downloaded from [62]) are two representative derivativefree algorithms. Therefore, we select those six methods as the basis for comparison.
The numerical results are arranged in Tables 6-14. Since the global minima of some test problems are unknown, we mark their global minima of those problems with the notation "\" in Tables 6-13. For those problems, we regard that the method fails to find their global minima if the found minima are greater than those of other methods. The number of CMA-ES's evolution generations is set to 10 5 . Due to the randomness of CMA-ES and VRBBO, we repeatedly calculate every problem about ten times with CMA-ES and VRBBO, and select the computed minima as their optimal values in Tables 6-13 When the unconstrained optimization problem (1) has many local minima, according to our numerical experiments, Algorithm 3 can find most of stationary points of f (x). Thus, it makes Algorithm CNMGE possibly find the global minimum of f (x) for the difficult problem. We illustrate this result via Problem 1. Problem 1 [45] is a molecular potential energy problem and its objective function has the following form: where ω i is the torsion angle and n + 3 is the number of atoms or beads in the given system. The number of its local minima increases exponentially with the size of the problem, which characterizes the principal difficult in minimizing molecular potential energy functions for the traditional global optimization methods. CNMGE can find 17 stationary points of problem (43) with n = 1000 (if CNMGE uses the Hessian matrix H(x), it can find 90 stationary points) and its global minimum f (ω * ) = −41.118303 located at ω * = (1.039195, 3.141593, . . . , 1.039195, 3.141593). However, for problem 1 with 50 atoms, the traditional global optimization methods such as the interval analysis method [40,45] or the multi-start methods (GlobalSearch [61,85], GLODS, VRBBO) are difficult to find its global minimum. Couenne, CMA-ES and MCS also fail to solve this problem.
For problems 37, 60, 62, 67, the original test problems have the default box constraints. GlobalSearch, Couenne, CMA-ES, MCS, GLODS can handle the problem with the box constraints and we use them to compute those four problems (problems 37, 60, 62, 67) with constraints. VRBBO and CNMGE can not handle the problem with the constraint and we use them compute those four problems (problems 37, 60, 62, 67) without the default box constraints. As long as they find the minima which are less than the known minima, we regard that they successfully solve those problems.
From Tables 6-14 and Figures 4-5, we find that CNMGE works well for unconstrained optimization problems and finds their global minima efficiently. From Tables  8-9 and Tables 12-13, we find that GlobalSearch, Couenne, CMA-ES and MCS all work well for small-scale problems. However, from Tables 6-7 and Tables 10-11, we find that GlobalSearch, CMA-ES and MCS fail to find the global minima of the large-scale problems about 17/34. Couenne can solve the most problems efficiently and it can not solve the large-scale problems efficiently about 12/34 (Couenne fails to solve 3 large-scale problems and 9 large-scale problems are solved by it about 8 hours). The probability of GLODS failure is 23/68 (33.82%) and the probability of VRBBO failure is 16/68 (23.52%). Therefore, CNMGE is more robust than other representative global optimization methods such as GlobalSearch, Couenne, CMA-ES, MCS, GLODS and VRBBO. Consequently, CNMGE can be regarded as an alternative workhorse for the global optimization problems.

Conclusions
Based on the deflation technique and the genetic evolution, we give an efficient continuation Newton for solving the global minimum of an unconstrained optimization problem. We implement it in the MATLAB2021b environment and denote this solver as CNMGE. In order to retain the usability of derivative-free methods and the fast convergence of the gradient-based methods, we use the automatic differentiation technique with the reverse mode to compute the gradient of the objective function and replace the Hessian matrix with its finite difference approximation for CNMGE. The numerical results show that CNMGE works well for global optimization problems, especially some large-scale problems of which are difficult to be solved by the known global optimization methods. Therefore, CNMGE can be regarded as an alternative workhorse for unconstrained optimization problems.