A hybrid parallel Harris hawks optimization algorithm for reusable launch vehicle reentry trajectory optimization with no-fly zones

Reentry trajectory optimization is a critical optimal control problem for reusable launch vehicle (RLV) with highly nonlinear dynamic characteristics and complex constraints. In this paper, a hybrid parallel Harris hawks optimization (HPHHO) algorithm is proposed to address the problem. HPHHO aims to enhance the performance of existing Harris hawks optimization (HHO) algorithm by three strategies including oppositional learning, smoothing technique and parallel optimization mechanism. At the beginning of each iteration, the opposite population is calculated from the current population by the oppositional learning strategy. Following that, the individuals in the two populations are arranged in ascending order on the basis of the fitness function values, and the top half of the resulting population is selected as the initial population. The selected initial population is divided into two equal subpopulations which are assigned to the differential evolution and the HHO algorithm, respectively. The both algorithms operate in parallel to search and update the solutions of each subpopulation simultaneously. Then the solutions are smoothed for each iteration by the smoothing technique to reduce fluctuations. As a result, the optimal solution obtained by the parallel optimization mechanism avoids falling into local optima. The performance of HPHHO is evaluated by 4 CEC 2005 benchmark functions and 3 constrained continuous optimal control problems, showing better efficiency and robustness in terms of performance metrics, convergence rate and stability. Finally, the simulation results show that the proposed algorithm is very effective, practical and feasible in solving the RLV reentry trajectory optimization problem.


Introduction
Reusable launch vehicle (RLV) is a special type of vehicle. After each mission into space, it is used repeatedly by recycling and reuse. It has good prospects in civil and defense applications and is also a goal pursued by different aerospace agencies around the world (Jee et al. 2014). Reentry trajectory optimization is a key part of the conceptual design of RLV and has a significant effect on the design of aerodynamics, structural strength and thermal protection systems, which plays a critical role in the safe and effective reentry flight (Zhang et al. 2020).
The philosophy of repeated launching with the same vehicle sounds interesting and fascinating, whereas a major challenging problem posed in flight missions is that of atmospheric reentry (Mao et al. 2018). The RLV reentry dynamics are highly nonlinear with limited control authority. In practice, path constraints such as heating rate, dynamic pressure and aerodynamic load must be strictly restricted in order to guarantee the structural and thermal safety and reliability of the vehicle. Especially with the diversification of reentry missions, the trajectories of the current RLV must sometimes meet the constraints of no-fly zones. Consequently, the RLV reentry trajectory optimization problem becomes highly complicated, and the gradient calculation becomes onerous and tricky. In recent years, nature-inspired algorithms (NIAs) have received widespread attention because they utilize the stochastic The  operators to escape from local optima and converge to approximate global optima in the iterative process. Another popular reason is due to their generality and ease of use and tune in practice. More commendably, they are not affected by the increased complexity (Gharehchopogh and Gholizadeh 2019). Perhaps in view of these advantages, NIAs based on global optimization techniques have, for example, been applied to find orbital transfer trajectories (Pontani and Conway 2010) or path planning for solar-powered UAV in urban environment (Wu et al. 2018). NIAs can be broadly divided into three categories: evolutionary algorithms (EAs), swarm intelligence (SI) and physics-based (PB) algorithms (Jain et al. 2019). EAs mimic the evolutionary behavior of creatures found in nature. Genetic algorithm (GA) and differential evolution (DE) (Storn and Price 1997) algorithm can be considered as the common standard forms of EAs. The second category is SI-based techniques, which imitate the intelligent social behavior of animal groups. Besides they collect and utilize the full information about search space with the progress of algorithm. Particle swarm optimization (PSO) (Eberhart and Kennedy 1995) and artificial bee colony (ABC) (Karaboga and Basturk 2007) can be regarded as the representative algorithms of SI. Several recent SI algorithms have also been proposed, such as grey wolf optimizer (GWO) (Mirjalili et al. 2014), moth-flame optimization (MFO) algorithm (Mirjalili 2015), whale optimization algorithm (WOA) (Mirjalili and Lewis 2016) and squirrel search algorithm (SSA) (Jain et al. 2019). The PB algorithms are inspired from basic physical laws that exist in the universe, and the prevailing methods of this category are simulated annealing (SA) (Kirkpatrick et al. 1983), multi-verse optimizer (MVO) ) and gravitational search algorithm (GSA) (Rashedi et al. 2009). Despite many prominent NIAs, there is no single optimization technique that can optimally solve all optimization problems, and be superior to all other optimization algorithms in solving the same optimization problem according to the ''No Free Lunch'' (NFL) theorem (Wolpert and Macready 1997). Therefore, researchers are still constantly developing more efficient optimization methods to achieve better designs. Consequently, various variants of the existing standard NIAs and hybrid algorithms with specific global and local search strategies have emerged in recent years. These modified versions can tackle the limitation of the basic NIAs such as big dimensional problems, avoiding trapping in local optima, and some of them can reduce execution time (Ridha et al. 2020). Thus, they have been widely carried out to realize different real-world cases, for instance, the hybridization of grasshopper and new cat swarm optimization algorithm for feature selection (Bansal et al. 2020), a hybridized version of the existing DE and symbiotic organism search algorithms for OPF problem (Saha et al. 2020), the incorporation of whale and sine-cosine optimizers with cellular topology for different parameter optimization cases (Turgut et al. 2021), etc.
Based on literature investigation of the mentioned studies, the modified versions of NIAs provide us a wider choice of optimization techniques, and their applications in the field of spacecraft reentry trajectory optimization are also dazzling. In the case of multi-objective trajectory optimization, there are many promising achievements. For example, Kang et al. (2015) developed an improved ABC for multi-objective reentry trajectory optimization of the RLV that could effectively and flexibly reflect the preference of the designers. Chai et al. (2017b) designed an adaptive DE algorithm that could generate high quality Pareto front of the multi-objective trajectory optimization problem of space maneuver vehicles (SMV). In their follow-up work (Chai et al. 2018(Chai et al. , 2020, they successively developed gradient-based hybrid genetic algorithm (GHGA) and novel NSGA-III-based optimal control solver to address multi-objective spacecraft trajectory optimization problems. The efficiency and feasibility of these algorithms are particularly attractive and can offer an alternative for multi-objective trajectory optimization issues. For objective reentry trajectory optimization, which is the focus of this paper, there are likewise many notable achievements. For instance, Zhang and Liu (2011) put forward an improved GA with sequential quadratic programming algorithm to achieve a fast and accurate optimum design of RLV reentry trajectory. Su and Wang (2015) proposed an improved GSA with gauss pseudospectral method (GPM) to address the RLV approach and landing trajectory optimization issue, which was proven practicable. Chai et al. (2017a) presented an differential evolution algorithm incorporated with a violation learning constraint handling strategy based on hp-adaptive GPM to solve the space maneuver vehicle trajectory optimization issue, and the method was demonstrated feasible and effective. Zhang et al. (2020) hybridized an improved WOA with GPM to solve the hypersonic vehicle entry trajectory optimization problem with no-fly zones, which exhibited faster convergence, higher accuracy and strong robustness. A common feature of these reported algorithms is that they combine the robustness of NIAs with the high accuracy of gradient-based algorithms to form a hybrid two-stage optimization algorithm, thereby generating optimal trajectories with faster convergence speed and efficiency. Meanwhile this combination is a frequently used optimization technique in trajectory optimization, especially when a priori knowledge about the optimal trajectory cannot be predicted in advance. However, there still exists some weakness, such as which degree of the accuracy of the first stage optimization results can provide the reliable guesses for the second stage optimization, thus ensuring the best efficiency of the entire optimization process. As a matter of fact, the two-stage optimization algorithm is a complex optimization problem in itself. The first stage of the optimization process is usually set to stop after reaching lower iterations, or when the tolerance set reaches the low-accuracy values in the order of 10 -2 to 10 -1 for normalized problems. Of course, from the available literature, it is also a trend to use a single NIA or its variants to address the trajectory optimization problem. The development of a novel stochastic gradient PSO for fast generation of feasible and smooth entry trajectories for hypersonic glide vehicle is a good example ), but the algorithm does not address the complex constraints of no-fly zones. Therefore, the point of departure is to develop a modified version of NIA with a few adjustable parameters that not only improves the optimization accuracy but is also quite robust to obtain a solution for RLV single objective reentry trajectory optimization with no-fly zones.
The RLV reentry trajectory optimization with complex constraints on path and no-fly zones is a highly nonlinear optimal control problem, and a priori knowledge of the optimal trajectory is usually unknown beforehand. As a result, the NIA based on global optimization techniques, which offer distinctive benefits, from the perspective of robustness, performance in the presence of uncertainty and unknown search spaces (Jain et al. 2019), is highly preferred. Beautifully yet imperfectly, stagnation at the early phases of the iterations, which leads to entrapment in local optimum points in the search space, is one of the inherent drawbacks of the NIAs (Turgut et al. 2021). Coupled with the fact that the RLV reentry trajectory optimization is a high-dimensional optimization problem that aims to search in a fixed space to obtain the optimal sequences of control variables, the shortcomings of premature convergence and stagnation in addressing this issue are more apparent in the standard version of NIAs. Motivated by these circumstances, this paper aims to develop a hybrid optimization algorithm based on a new NIA Harris hawks optimization (HHO), namely, hybrid parallel Harris hawks optimization (HPHHO) algorithm, thereby setting up a more stable trade-off between exploration and exploitation trends to improve the search efficiency for the best solution. In view of an application, the proposed algorithm does not require a priori knowledge of the optimal solution and can automatically detect the regions in the solution. In addition, it can guarantee the satisfactory accuracy without regard to the quadratic optimization problem. All these reasons make it suitable for addressing the RLV reentry trajectory optimization problem.
The proposed HPHHO algorithm in this paper mainly uses three strategies: oppositional learning, smoothing technique, and parallel optimization mechanism. The oppositional learning in each iteration is used to intensify the diversity of the initial population. On the other hand, the smoothing technique is employed to smooth the solutions to reduce or avoid the fluctuations. In terms of the parallel optimization mechanism, it consists of HHO and DE two algorithms, whereas the core search procedure of the proposed algorithm is the HHO. The HHO is one of the latest SI algorithms mimicking the surprise pounce behavior of Harris hawks' chasing style (Heidari et al. 2019). The use of adaptive and time-varying escape energy parameters enables it to perform a smooth transition between exploration and exploitation, which can well address the difficulties of the search space, including local optimal solutions, multi-modality, and deceptive optima. Besides, this technique has been applied to tackle different real-world cases, such as manufacturing industry, environmental quality, image segmentation, and power systems (Alabool et al. 2021). Furthermore, in order to make the global exploration capability of HHO significantly strengthened, the logistic chaos map sequence, DE/currentto-best/1 operator and Levy mutation operator have been embedded in the standard version of HHO. When referring to the DE algorithm, it is regarded as a remarkable EA for global optimization over continuous search space, whose performance of simplicity, straightforwardness in implementation and good convergence are outstanding. Moreover, the DE/best/1 operator exhibits relatively faster convergence, which may provide an excellent exploitation ability (Sun et al. 2019). In practice, DE often works as a local search mechanism to improve the exploitation of hybrid algorithms, such as the combination of DE with GA (Mustafi and Sahoo 2019), sine-cosine algorithm (Nenavath and Jatoth 2018), and ABC (Jadon et al. 2017). Inspired by these observations, this paper regards DE as an exploiter of parallel optimization mechanism for promising regions on the search space to fine-tune the current optimal solution. Meanwhile, HHO acts as a global search mechanism to reduce the possibility of premature convergence and to better guide the search toward promising regions. In this way, the DE and HHO operate in parallel to search the global optimal solution more efficiently.
The contributions of this paper have three aspects: • Propose an algorithm HPHHO by using oppositional learning, smoothing technique and parallel optimization mechanism strategies to improve the search efficiency for the best solution. • Evaluate the performance of HPHHO on 4 benchmark problems in comparison with other NIAs, including PSO, GWO, MFO, ABC imp2 , MVO, WOA, WOASA, HHO and HHODE. • Apply HPHHO to solve the RLV maximum cross-range problem with no-fly zones constraints.
The paper is outlined as follows: Sect. 2 presents and formulates the problem of RLV reentry trajectory optimization. The fundamentals of HHO and DE algorithms are described in Sect. 3. Section 4 provides the detailed discussion of the proposed HPHHO algorithm. Section 5 investigates the performance of HPHHO on four examples and applies it to solve the RLV reentry optimization problem with no-fly zones constraints. Finally, the conclusions and future work are given in Sect. 6.

Reentry dynamics
The motion of RLV reentry problem is defined by the following set of equations (Betts 2009): and the aerodynamic forces are computed according to: where the states are geocentric radius r, the longitude h, the latitude /, the Earth-relative velocity V, the flight path angle c, and the heading angle v, while the controls are the bank angle r and the angle of attack a. m is the mass of RLV; q is the atmospheric density; S is the reference area; C L and C D are aerodynamic lift and drag coefficients, respectively.

Path constraints
The dynamic pressure q, heating rate _ Q and overload n z should be taken into account during the entire descent phase, and they can be written as follows: where c 0 , c 1 , c 2 and c 3 are fixed values (Graichen and Petit 2008). g 0 is the gravitational acceleration at sea level.

State and control constraints
To ensure the stability and controllability of RLV, it is necessary to limit the magnitude of the state x = [r, h, /, V, c, v] T and the control u = [a, r] T . The entire flight process can be expressed as:

Terminal constraints
According to different flight missions of RLV, terminal conditions are required. Without loss of generality, terminal constraints can be expressed as (subscript f represents terminal state):

No-fly zones constraints
No-fly zones are areas that RLV cannot pass through during flight, like areas that are not violated by geopolitical factors, and air defense areas of defenders. In this modeling, no-fly zones are specified as cylindrical zones with infinite height, which is shown as (Chai et al. 2020): where S n is the total number of no-fly zones. The jth no-fly zone is described by the center (h j , / j ) and the radius R Zj .
3 Harris hawks optimization and differential evolution algorithm

Harris hawks optimization algorithm
HHO is inspired by exploring a prey, surprise pounce, and different attacking strategies of Harris hawks in nature (Heidari et al. 2019). Just like other SI algorithms, HHO also includes two optimization phases, namely global exploration and local exploitation as shown in Fig. 1. In HHO, the Harris hawks are the candidate solutions and the best candidate solution in each step is considered as the intended prey or nearly the optimum.

Exploration stage
In HHO, the Harris hawks perch randomly on some certain locations based on the positions of other family members or random tall trees, and wait to detect a prey. Considering an equal chance q of the conditions, which can be modeled as follows: Xðt þ 1Þ¼ X rand ðtÞ À r 1 X rand ðtÞ À 2r 2 XðtÞ j j q ! 0.5 where X (t ? 1) is the position of hawks in (t ? 1) th iteration, X rabbit represents the position of rabbit or prey, r 1 , r 2 , r 3 , and r 4 are also random numbers inside (0, 1) updated in each iteration. LB and UB are the lower and upper bounds of decision variables with the specific problem, respectively. The average position of hawks X m can be calculated using Eq. (8). N denotes the population size.

Transition from exploration to exploitation
An important control parameter to measure the transition between exploration and exploitation of HHO is the escaping energy of the prey E, which decreases with the increasing iteration, Iter, defined as follows: where E 0 denotes the initial energy of prey changing in the interval [-1, 1], and T is the maximum number of iterations. If |E|C 1, the HHO performs the exploration phase to search the prey, while if |E| \ 1, the exploitation phase is performed to exploit the promising areas.

Exploitation stage
Once the prey (rabbit) is detected, the Harris hawks prefer to attack it with surprise pounce. In reality, different chasing styles occur with the various escaping behaviors of the prey. Four possible strategies are simplified to model the attacking situations (Heidari et al. 2019). A random parameter r is utilized to measure the chance of a prey in successfully escaping. The situation r \ 0.5 performs successful escape, while r C 0.5 indicates unsuccessful escape. Whatever the prey does, the Harris hawks will perform a hard or soft besiege to catch it depending on its retained energy E. If |E|C 0.5, the soft besiege occurs, while if |E| \ 0.5, the hard besiege happens.
• Soft besiege Assuming that r C 0.5 and |E|C 0.5, the rabbit still has enough energy to struggle for escaping. Finally, the rabbit is exhausted and fails to escape by the continuous softly encircling of the Harris hawks. This behavior is formulated as: DXðtÞ¼X rabbit ðtÞ À XðtÞ ð 11Þ where DX(t) is the difference between the position vector of the rabbit and the current individual. J = 2(1-r 5 ) represents the random jump strength of the rabbit during the escaping process, and r 5 is a random number inside (0, 1).

• Hard besiege
Assuming that r C 0.5 and |E| \ 0.5, the rabbit is too exhausted to have enough energy for escaping. At this moment, the Harris hawks hardly encircle the rabbit to perform the final surprise pounce. The procedure can be expressed as: • Soft besiege with progressive rapid dives Assuming that r \ 0.5 and |E|C 0.5, the rabbit still has enough energy to escape successfully. Meanwhile, a more intelligent soft besiege is constructed before the surprise pounce. It must be noted that the updating position of the Harris hawks is a two-step procedure with greedy selection strategy. The first step process can be mathematically written as: and the second-step, Levy flight is used to perform the rapid, irregular, and abrupt movement of Harris hawks when chasing the rabbit. The position is updated as follows: where D is the dimension of problem and S is a random vector by size 1 9 D and Levy represents the Levy distribution which is calculated as follows: where u, v are random values inside (0, 1), b is a constant value set to 1.5. The final strategy for updating the positions in this procedure is given as follows: • Hard besiege with progressive rapid dives Assuming that r \ 0.5 and |E| \ 0.5, the rabbit has not enough energy but succeeds in escaping. At this moment, the hawks construct a hard besiege before the surprise pounce, and they try to bring themselves closer to the prey. Hence, the final strategy for updating the positions is performed as: Y ¼ X rabbit ðtÞ À E JX rabbit ðtÞ À X m ðtÞ j j ð18Þ where Y and Z are obtained using the new rules in Eqs. (18) and (19).

Differential evolution algorithm
DE is a stochastic search algorithm based on population, which simulates the natural evolutionary process via mutation, crossover and selection to move its population toward the global optimum (Sun et al. 2019). The DE algorithm mainly includes the following three operations.

Mutation operation
Mutation operation in DE is performed to create three indices randomly in range over [1, N], where N is the population size, and then three solution vectors X r1 , X r2 , and X r3 with a given index will be selected to generate a new solution V i from the current solutions in the search space. The two most frequently used mutation operators implemented in DE variants are as follows: where X best is the best current solution vector, F is the mutation scaling factor inside [0, 1]. In the present work, DE/best/1 is adopted as the mutation operator to accelerate the convergence speed and maintain population diversity.

Crossover operation
Crossover operation in DE is performed to produce a trial vector U i from the corresponding mutant solution vector V i and the target vector X i according to: where CR is the crossover rate, and j rand is random value over [1, D] which represents the randomly chosen index in the solution vector. D is the dimensionality of the problem..

Selection operation
Selection operation in DE is performed to make a choice between X i or U i based on their fitness value f (Á), by selecting the fitter one for next iteration according to:

Oppositional learning
The opposition-based learning (OBL) is intended to generate the 'opposite' local of solutions in the initialization phase to cover a large area in the feasible domain, thereby improving the diversity of the initial solutions. Even without a priori knowledge, using OBL can still obtain fitter starting candidates and enhance the probability of detecting better regions (Liang et al. 2019). Many NIAs (such as GA, PSO, GSA, SA and krill herd algorithm) use OBL to improve their performance (Mahdavi et al. 2018). Inspired by these observations, the OBL idea is incorporated in the HPHHO initialization to improve diversity and search capabilities. For individual solution X i , its opposition solution X oi can be determined as follows: where j = 1, 2, …, D (the dimension of specific problems) and i = 1, 2, …, N (the total number of hawks). Besides, the upper value ub j and lower value lb j are used in opposite solutions calculation.

Chaotic sequence
Another popular strategy to improve the convergence of HHO is to replace random parameters with chaotic map sequences (Abd and Mirjalili 2019). With the characteristics of ergodicity and non-repetitiveness, chaos can perform overall search at a higher speed than stochastic ergodic search that depends on probabilities (Coelho and Mariani 2008). Hence, the logistic chaos map sequence is used to tune the random parameter r of HHO, which helps to control the exploitation mechanism. In addition, it can be defined as a discrete-time dynamic system as shown below: where t indicates the current iteration. Besides, the control parameters l = 4, lr(1) [ (0,1) and lr(1) = 0.25,0.5, 0.75, 1. Unlike HHO, the r is taken from the chaotic sequence lr, and in this way r (t) = lr (t).

Smoothing technique
In order to reduce or avoid the control function fluctuations and corners in the optimal solution, smoothing technique (Hashemi and Mirjalili 2018) is embedded in the processing phase of HPHHO to smooth the solution. As depicted in Fig. 2, the white points are control values before smoothing while the black points are corresponding smoothed ones. It is worth noting that smoothing technique is only performed when zigzag shapes in raw data and occur such as a zigzag between u k-1 , u k and u k?1 . The average value based on each of three successive points is employed to reduce the magnitude of jumps. k is a weight coefficient and the formula for smoothing at point k can be expressed as: where u k is the smoothed value of u k and replaced it during the next steps of iterations.

Mutation operators
To utilize the best current individual population information, accelerate the search speed, and reduce the risk of trapping in local optima. The famous DE/current-to-best/1 mutation operator is adopted to replace the original position update strategy in Eq. (7). The formulation can be rewritten as: where r 4 and r 5 are also random numbers inside (0, 1).
Once the global search stage has passed, the intensity of exploitation should be strengthened in a promising search space, which benefits to convergence to the global optimum. Levy mutation operator is likely to generate a different offspring due to its heavy-tailed distribution and can help the individuals escape from local optima easily. Therefore, local Levy mutation is utilized instead of global Levy flight. Equations (14) and (19) can be rewritten as: For a clear understanding, the specific pseudo-code of the HPHHO algorithm is depicted in Fig. 3, and the flowchart in Fig. 4 shows the procedures of HPHHO, including OBL strategy, logistic chaos map sequence, smoothing technique and mutation operators. As can be seen, the HHO works as an exploration-based algorithm aims to search in far reached neighborhoods for finding global optimum, while the DE works as an exploitation- based algorithm focuses the search on already identified potential neighborhood for converging to global optimum. Therefore, the HPHHO can maintain a more stable balance between exploration and exploitation trends.

Computational complexity
Note that the computational complexity of HHO depends on the number of Harris hawks (N), the dimensions (D), and the maximum number of iterations (T). The time complexity of the basic HHO is O (N 9 (1 ? T ? T 9 D)). Correspondingly, there are seven aspects being mainly responsible for the computational complexity of the HPHHO algorithm, which are initialization, OBL strategy, chaotic sequences, fitness evaluation, smoothing technique, parallel optimization mechanism, and mutation operators.
Since the fitness evaluation is related to specific problems to be solved and the mutation operators in HPHHO are only a replacement of the original operators, the calculation complexity of them can be neglected. The time complexity of initialization, OBL and chaotic sequences require O(N), O(N 9 T) and O(T), respectively. The smoothing technique requires O(N 9 T 9 (D-2)). The time complexity of searching the best location and updating of individuals of parallel optimization mechanism are O(T 9 2Nlog2N) and O(N 9 T 9 D/2 ? N 9 T 9 D/2). Therefore, the total time complexity of HPHHO is O(N 9 T 9 (2log2N ? 2D-1) ? N ? T), which is generally a little higher than the basic HHO.

Constraint handling
The HPHHO was originally proposed to solve the static optimization problem without constraints. Thus, the penalty function method working as a kind of constraint handling techniques was utilized to address the constrained continuous optimal control problem. The equidistant discrete time nodes within user-defined upper (ub) and lower (lb) bounds are used to parameterize the control variables, which means X [ [lb, ub]. If the lower or upper limit is violated, the repair rules will serve as: The time history of state variables can be calculated by integrating differential equations with the fourth-order Runge-Kutta method. Decision variables are considered to be inequalities (g i (X) B 0). The dimensions and  Fig. 4 The flowchart of HPHHO magnitudes of the objective function f (X) and constraint functions g i (X) may be very different, which will make the optimization process complicated. Thus, the modified fitness is normalized using positive parameter m i , which can be written as: where n is the number of constraints g i (X) that are not satisfied, and q is a positive constant.

Numerical examples
In order to evaluate and investigate the performance and effectiveness of the proposed algorithm, the HPHHO is compared with some well-known algorithms including PSO ( process of NIAs is set to stop when the change in fitness value between 10 iterations e stop is less than 1e-6. There are two definitions as follows: (1) Number of successful runs (SR): The number of successful runs for which the algorithm reaches the global minimum (the relative error with the optimal solution is less than 5%). (2) Successful runs rates (SRR): SRR is equal to SR/20. What needs special attention is that the parameters of example 5.2 on four benchmark functions are different, while N pop = 30, MaxIter = 500 and all algorithms are executed 30 independent runs.

Multiple solution example
This is an optimal control problem with one state and one control, xðtÞ; uðtÞ 2 R; as well as two distinct local minima (Floudas et al. 1999). The objective function and constraints can be expressed as follows: The exact objective function solution for the global minimum of the problem is -8.23623 with control variable u = -5. Note that there also exists another local minimum with control variable u = 5.
As shown in Fig. 5, the PSO, GWO, MFO, and MVO fail to find the global optimal solution in this optimal control problem, whereas the ABC imp2 is not available on all successful runs. Thus, they have worse local optima avoidance. Apart from these algorithms, the WOA, WOASA, HHO, HHODE and HPHHO have managed to find the global optimum with higher accuracy. Therefore,

Performance on benchmark functions
The performance of HPHHO is evaluated and investigated on 4 mathematical test functions with different characteristics, such as unimodal, multi-modal and composite. The standard benchmark functions are taken from the IEEE CEC 2005 set, which are tabulated in Table 2. In the average value (AVG) and standard deviation (STD), the comparative results of the HHO variants as well as the 4 optimizers are obtained to evaluate the performance, of which the results shown in bold are the best. In order to further investigate the statistically significant differences between the HPHHO and its competitors, a nonparametric Wilcoxon sign rank test at 5% significance level (Derrac et al. 2011) is used herein. The symbols of '' ? '', '' = '' and ''-'' indicate that the HPHHO is superior to, equal to and inferior to its counterparts, respectively. Moreover, the Friedman test is also employed to rank the average performance of all competitors, and the average ranking value (ARV) is listed in comparison results. In order to deeply investigate the influences of OBL, logistic chaos map sequence, two mutation operators and population distribution order on the performance of the algorithm, this section tests the 4 HHO variants based on 4 benchmark functions, namely OPHHODE, PHHCDE, PHHLDE and OPHHCLDE. The first three variants imply combining only with OBL, logistic chaos map sequence and two mutation operators, respectively. For OPHHCLDE, it incorporates all the three strategies. Notice that OPHHCLDE means that the top half of the initial population is allocated to HHO and the latter half of the initial population is allocated to DE, while the distribution order of population is opposite to the proposed HPHHO algorithm.

Performance metrics
The comparative results of HHO variants as well as 4 optimizers on 4 functions are presented in Tables 3 and 4. It can be seen that the HPHHO outperforms other competitors in almost all cases in terms of AVG and STD values, indicating its higher accuracy and robustness. In addition, when it refers to the average rankings obtained by Friedman test, the proposed algorithm HPHHO ranks first in all cases with the lowest ARV. From a statistical point of view, it has achieved the best performance among all these competitors. According to the Wilcoxon's sign rank test and the meaning of the symbol of '' ? / = /-'', the HPHHO is superior to PHHCDE, PHHLDE, WOASA, HHO and HHODE on 3 out of 4 cases, and equal to them on 1 out of 4 ones. Moreover, the results of HPHHO performs significantly better than those obtained by WOA in solving all the functions. Furthermore, the OPHHODE and  OPHHCLDE have shown relatively better performance due to the inclusion of the OBL strategy. Both are equal and inferior to HPHHO on 2 out of 4 cases, respectively. Given that the program responsible for exploration and exploitation in OPHHCLDE is exactly the opposite of HPHHO, and the ARVs of the two algorithms are 2.38 and 2.25, respectively, it is clear to see that the performance difference between them is not particularly obvious. The evaluation of these two algorithms will be discussed further in Sect. 5.3.

Convergence rate analysis
This section mainly evaluates the convergence property of HPHHO versus the 4 well-developed optimizers. The convergence curves of the WOA, WOASA, HHO, HHODE and HPHHO on all the functions are plotted in Fig. 6 to observe the convergence rate of the algorithms. The WOA and WOASA are prone to premature convergence on F1, whereas the WOA, HHO and HHODE have trapped into local optima on F4. None of them can obtain a good solution on all cases. As expected, the HPHHO shows a fast convergence speed from the initial steps of iterations and a strong capability to obtain a high-quality solution and this behavior is evident on all functions. As a summary, the comparative results reveal the different features of the proposed HPHHO algorithm. The OBL strategy can enhance the diversity of the population, thereby ensuring a better solution in the initial stage of the iteration. Besides HHO, with mutation operators, can jump out of the local optima and search in far reached neighborhoods for finding global optimum, while greatly reducing the likelihood of stagnation. Meanwhile, the DE is committed to enhancing the exploitation by searching the most promising regions identified by HHO. Both DE and HHO in the parallel optimization mechanism operate in coordination with each other and can quickly converge to the optimal solution. However, WOA and HHO do not have additional special operators to devote iterations to searches in the promising regions. As a result, they are not completely free from the problems of premature convergence and stagnation. If most individuals fall into local  optima, especially if the optimal solution also falls into it, the whole algorithm will stagnate, and these conditions are where the algorithm will diverge.

Stability analysis
This section analyzes and discusses the stability, including two performance indicators, namely STD value and boxplot. STD represents the degree of dispersion of a dataset. The smaller the STD, the more stable the algorithm is. The competitive results of HPHHO and other four algorithms run independently for 30 times are shown in Table 4, from which it can be found that HPHHO has the lowest STD value among all functions compared with its counterparts, showing remarkable stability. Since the 4 tested functions present unimodal, multi-modal and composite characteristics, respectively, the boxplots of each algorithm and function are shown in Fig. 7

Minimum energy
The influence of smoothing technique is studied in this section. Consider the minimum energy problem of linear dynamical systems with analytical solutions (Hashemi and Mirjalili 2018). The objective function and constraints can be expressed as follows: while the analytical solution can be defined as: where c 1 ¼ 3À2e e 2 À4eþ3 ;c 2 ¼ e 2 e 2 À4eþ3 ; c 3 ¼ À2e e 2 À4eþ3 ; c 4 ¼ 2e eÀ3 . The results of the system responses are described in Figs. 8(a-c). Obviously, compared with the non-smoothed solutions without the smoothing technique, the HPHHO (Smoothed) time history curves are more consistent with the exact solution. Through the depiction of the control variable time history curve in Fig. 8(c), it can be found that the non-smoothed solution fluctuates greatly and is jagged, while the smoothed solution is smoother and closer to the actual control variable curve. In addition, the best fitness function value with respect to iterations is depicted in Fig. 8(d), which represents the convergence rate. As expected, HPHHO requires fewer iterations than nonsmoothed, reflecting the competitiveness of the smoothing technique for the algorithm in terms of time and computational resources. Obviously, the smoothing technique can significantly improve the performance of the proposed algorithm in solving continuous optimal control issues.

RLV maximum cross-range with heating rate constraint
In this section, the HPHHO and the variant OPHHCLDE are used to solve the RLV maximum cross-range with heating rate constraint problem. The influence of the population distribution order is investigated. The objective of the reentry problem is to maximize the cross-range, which is equivalent to maximizing the terminal latitude /(T end ). The specific values of initial and terminal conditions as well as the state constraints follow (Graichen and Petit 2008). In addition, the control constraints can be formulated as follows: There is another single path constraint satisfying _ Q 70 Btu/ft 2 /s, and the exact solution of /(T) is 30.6255°. A comparison between the results obtained from OPHHCLDE and HPHHO is listed in Table 5. In addition, the best results of the two methods and a constructive method (Graichen and Petit 2008) are depicted in Fig. 9.
As shown in Table 5, the SRR of OPHHCLDE is 55%, which indicates that it has poor robustness and tends to fall into local optima. In addition, the quality of its solution is not good, with relative error 9.02%. Compared with OPHHCLDE, HPHHO has a relative error of 2.15% and an SRR of 100%. It has a higher accuracy and strong robustness. Figure 9 shows the best results obtained from OPHHCLDE and HPHHO, both of which are close to the global optimal solution obtained by the constructive method. Besides, it can be observed that the curves of states, controls and path constraints of HPHHO fit better. Obviously, in terms of the quality of the optimal results, HPHHO is superior to OPHHCLDE. In solving the same type of problems, HPHHO can find the best solution at one time regardless of the increasing scale and complexity of the problem, thus showing good effectiveness.

RLV maximum cross-range with different constraints
Based on the study of the above simulations, it can be found that HPHHO can solve RLV maximum cross-range with heating rate constraint problem with better robustness and effectiveness. To further evaluate the applicability and Fig. 7 The boxplot of each method on four benchmark functions feasibility of HPHHO in dealing with the RLV maximum cross-range optimization problem, this section tests it by adding different path constraints, involving uncertainty analysis as well as no-fly zones constraints.

Performance of HPHHO in addressing different constraints
A comparative study with respect to the performance of HPHHO in solving the RLV maximum cross-range with different constraints is carried out in this section. Three mission cases are tested to demonstrate the capability and performance of HPHHO in handling different constraints, and the three mission cases are summarized as follows.
The obtained fitness values versus iterations and the heating rate with respect to time curves are plotted in Fig. 10, from which it can be observed that the HPHHO can generally obtain satisfactory solutions for different mission cases. The peak heating rates and the number of iterations stops amount to 184.1, 104.3, 71.7 Btu/ft 2 /s, and 200, 84, 84 for the three cases, respectively. It is worth noting that the heating rate constraint is active at only one collocation point in Case 2, while there are multiple heating rate constraints active in Case 3. In other words, when the constraint conditions become tighter, it is evident that the HPHHO might fail to drive all candidate solutions into  the feasible region and the optimization convergence accuracy is somewhat limited. However, the reduced feasible domain space seems to have a positive influence on the population evolution and convergence process. Overall, the HPHHO tends to have more potential in dealing with the investigated problem with various mission constraints. This section analyzes the impact of different variables on the terminal latitude/(T end ). As stated in (Chai et al. 2020), the mass m of RLV may experience a fraction, especially when the RLV enters the atmosphere. Thus, an uncertain evaluation is performed in terms of the uncertain mass value. Take mission case 2 as an example and the mass value uncertainties are set as dm = ? 5% and d m = ? 10%. The obtained uncertainty-perturbed fitness values are plotted in Fig. 11(a), from which it can be observed that as dm increases, the fitness values become -0.58 and -0.57, which is deviated from the original solution -0.59. As an increase in RLV mass results in a decrease in velocity, the lower speed of the vehicle also has negative influences on the latitudinal acceleration. The variation of latitude is lower than the one without considering the mass uncertainty, and this phenomenon can be reflected by the curve in Fig. 11(a). Therefore, it is necessary to consider the mass uncertainty in the actual design of the spacecraft reentry trajectory. Another uncertain evaluation is also performed to study the impact of the uncertain variable c. Note that the flight path angle is one of the most critical parameters with respect to RLV trajectory optimization reentry conditions. The flight path angle value uncertainties are set as c = 0 deg and c = -0.5 deg. Figure 11(b) displays the obtained uncertaintyperturbed fitness values versus iterations, and it can be seen that as c increases, the fitness values become -0.58 and -0.53, moving away from the original solution -0.59. The terminal latitude exhibits a high sensitivity to the flight path angle, whose uncertainty should be considered in the actual design of the spacecraft reentry trajectory.
It is clear from Fig. 11 that the results obtained by HPHHO for the mass uncertainty perturbation can match well with the theoretical analysis compared with the normal results, which indirectly reflects the effectiveness of the proposed algorithm. In addition, the RLV flight reentry trajectory is highly sensitive to the flight path angle and the performance of the algorithm depends on its specific initial value. In general, HPHHO can handle these cases effectively.

Reentry trajectory optimization problem with no-fly constraints
In this simulation, the no-fly zones constraints are considered during the reentry phase (case 2). The test is to determine how well the numerical solution obtained by the HPHHO avoids different kinds of constraints and obtains the objective solution, thus reflecting the applicability and feasibility of the proposed algorithm. No-fly zones parameter constraints can be found in Table 6. The opensource software GPOPS (Rao et al. 2010) based on GPM is used for comparative analysis.
As depicted in Fig. 12, the trajectories of GPOPS and HPHHO are very consistent, and all the path constraints are not violated, which prove the practicality and feasibility of HPHHO in solving this case. In particular, the trajectories with and without no-fly zones constraints shown in Fig. 12(d) indicate that the proposed algorithm is effective for this mission. Moreover, the heating rate plays a major role in the early stages of the reentry phase, while the dynamic pressure and overload gradually dominate in the subsequent gliding stage as the altitude decreases and the air density increases. The two optimal trajectories and the location of 3D no-fly zones are displayed in Fig. 13, from which it can be seen that both optimal trajectories smoothly avoid the no-fly zones.
Given the lack of a priori knowledge of the optimal trajectory, this case uses linear interpolation of the initial and terminal conditions as the initial guesses for GPOPS. Trajectory optimization is achieved with the number of Gauss collocation points N = 40, which also requires iterative training. Unsuitable guesses and parameter choices Fig. 10 Results of HPHHO for three different cases may lead to the use of a large amount of computational resources, and even some optimization results are difficult to satisfy the dynamic model and path constraints, such as the results of GPM optimization in (Zhang et al. 2020). The efficiency of GPOPS largely depends on the specific optimization problem. In contrast, the proposed algorithm HPHHO does not require a priori knowledge of the optimal solution and can automatically detect the region in the solution within the interval of user-defined control variables. In addition, compared with the hybrid NIAs based and gradient-based algorithms, it can guarantee the satisfactory accuracy without regard to the quadratic optimization problem.
While admiring its potential and superiority, we have to acknowledge the fact that the algorithm proposed in this paper is somehow based on a continuous iterative process of stochastic operators, and the processing time of the Fig. 11 Results of fitness value with uncertainty optimization procedure does not currently support the use of the algorithm for online optimal trajectory design. It is important to remark that for practical spacecraft guidance and control systems, the design of optimal flight trajectories is usually carried out offline (Chai et al. 2020). In fact, if parallel computing or high-performance computers can be implemented to optimize the flight trajectory, the optimization processing can be further reduced. In addition, the solution obtained using HPHHO can not only provide highquality reference trajectories for online tracking algorithms, but also provide an alternative for constructing trajectory databases.

Conclusions
This work proposes a HPHHO algorithm for RLV reentry trajectory optimization problems. The main purpose of HPHHO is to set up a more stable trade-off between exploration and exploitation trends. To obtain a global optimization solution, three strategies are adopted including oppositional learning, smoothing technique and parallel optimization mechanism. Oppositional learning is used to intensify the diversity of the initial population, whereas the smoothing technique is employed to smooth the solutions of the redistributed initial population. In the strategy of parallel optimization mechanism, the DE algorithm enabled by DE/best/1 operator exhibits relatively faster convergence. Simultaneously, incorporating with the logistic chaos map sequence, DE/current-to-best/1 operator and Levy mutation operator, the HHO exhibits a super exploration capability. DE and HHO operate in parallel to update the location of their respective population, so they can actually better search for the global optimum more efficiently. By testing on 4 standard benchmark functions, the HPHHO gets superior results in terms of average value, standard deviation, and convergence rate compared with 4 well-established optimizers and 4 HHO variants. By testing 3 constrained continuous optimal control problems, the HPHHO have shown significant advantages in convergence and robustness. Finally, the proposed algorithm is successfully applied to solve the critical problem for RLV trajectory reentry optimization with no-fly zones. The effectiveness, applicability and feasibility of the proposed algorithm are further demonstrated by simulation results. In addition, this research can be extended in many research areas, including multi-objective optimization problems, initial guess generator for gradient-based methods, and optimal control problems.

Declarations
Conflict of interest The authors declare that they have no conflict of interest.
Ethical approval This article does not contain any studies with human participants or animals performed by any of the authors.