A Scalable, Black-box, Hybrid Genetic Algorithm for Continuous Multimodal Optimization in Moderate Dimensions


 Optimization problems can be found in many areas of science and technology. Often, not only the global optimum, but also a (larger) number of near-optima are of interest. This gives rise to so-called multimodal optimization problems. In most of the cases, the number and quality of the optima is unknown and assumptions on the objective functions cannot be made. In this paper, we focus on continuous, unconstrained optimization in moderately high dimensional continuous spaces (<=10). We present a scalable algorithm with virtually no parameters, which performs well for general objective functions (non-convex, discontinuous). It is based on two well-established algorithms (CMA-ES, deterministic crowding). Novel elements of the algorithm are the detection of seed points for local searches and collision avoidance, both based on nearest neighbors, and a strategy for semi-sequential optimization to realize scalability. The performance of the proposed algorithm is numerically evaluated on the CEC2013 niching benchmark suite for 1-20 dimensional functions and a 9 dimensional real-world problem from constraint optimization in climate research. The algorithm shows good performance on the CEC2013 benchmarks and falls only short on higher dimensional and strongly inisotropic problems. In case of the climate related problem, the algorithm is able to find a high number (150) of optima, which are of relevance to climate research. The proposed algorithm does not require special configuration for the optimization problems considered in this paper, i.e. it shows good black-box behavior.


Introduction
Most of real-world optimization problems have more than one optimal or near optimal solution. Such problems are called multimodal optimization (MMO) problems. They are of great importance in practice, for engineers to understand problems better, for decision makers to make appropriate decisions. In most of the practical cases, the number and quality of optima or nearoptima are not known. There is therefore a need for black-box algorithms, which do not require any a priori information about the optimization problem (Audit and Hare, 2017). Furthermore, in order to avoid repeated numerical investigations, it is also highly desirable to have algorithms which do not require (complicated) configuration, ideally none.
Many different strategies have been designed to solve multimodal problems. We will not give an overview, the field of possible strategies is simply too wide. For an overview, we refer to (Li et al., 2016;Preuss, 2015). In this work, we focus on evolutionary algorithms (EA), which are among the well-suited strategies for MMO (Preuss, 2015). Within the EAs, our choice are genetic algorithms (GA). In the context of MMO, GAs usually employ diversity preserving methods, so-called niching algorithms. Among them, the most famous ones are crowding (Jong, 1975) and fitness sharing (Goldberg and Richardson, 1987). Later, these strategies have been modified and refined and new ones have been invented, see (Mahfoud, 1995;Mengshoel and Goldberg, 2008;Sareni and Krahenbuhl, 1998;Smith et al., 1993). Similar strategies for other EAs, like particle swarm optimization (PSO) or differential evolution (DE) have been developed.
In order to take advantage of advanced convergence performance of especially designed core algorithms (local solvers), often a hybrid strategy is adopted (Goldberg and Vössner, 1999). In a first stage, a niching algorithm detects the niches and in a second stage, a core algorithm approximates the local or global optimum. We adopt this strategy. In order to scale to in principle any number of optima (Kronfeld and Zell, 2010), we use the niching algorithm in semi-sequential mode with a linearly increasing population.
The aim of our development is the design of a virtually parameter-free algorithm. We therefore choose the components of our algorithm such that good generic choices are possible for them.
The remainder of this paper is organized as follows. In Section 2 we present the overall design of the algorithm as well as its parts. For well-know algorithms we discuss their configuration, for new developments more details are given. In Section 3 we report on numerical experiments. We present results on the CEC2013 niching benchmark problems (Li et al., 2013) and of an optimization problem from climate research (constraint optimization) (Goris et al., 2021). A discussion of the results is given in Section 4 and conclusions are presented in Section 5.
For the remainder of the paper, we will assume that the optimization problem is posed as a maximization problem, and will consequently talk about maxima instead of optima and the maximization problem instead of the optimization problem.

The proposed algorithm
In this work we present an optimization algorithm for real-valued multimodal optimization problems in d ≥ 1 dimensions, which is essentially parameter-free. Only the maximum number of function evaluations needs to be specified.
Let X ∈ R d , d ≥ 1 be the real-valued search space and f : X → R a real-valued objective function to be maximized. The solution to the multimodal maximiza-tion problem is to find (1) In many of the practical cases the algorithm will find only a subsetS ⊂ S of all maxima. The algorithm presented here is invariant under strictly monotonous linear transformation of the objective function. It can be expected that maxima are dected in sequence of decreasing size of the corresponding domains of attraction, see Section 2.2.

Overall strategy
The proposed algorithm has a hybrid structure. To approximate niches, a deterministic crowding genetic algorithm (Mahfoud, 1995) is used as niching algorithm. After a fixed number of generations for this genetic algorithm, a nearest neighbor algorithm is employed to determine starting points and covariance matrices as input for the local solver, a covariance matrix adaptation evolution strategy (CMA-ES) (Hansen et al., 2001(Hansen et al., , 2019). An outer loop repeatedly calling the niching algorithm and the local solver is successively increasing the population of the niching algorithm. In this way we are realizing a semi-sequential behavior. We call the overall algorithm the Scaling Hybrid Genetic Algorithm (SHGA), see Algorithm 1 below. In Algorithm 1, the outer loop (line 2) increases the size of the population at the beginning of each iteration by randomly generated n p individuals (line 4). These are added to the previously existing individuals, which themselves remain unchanged. Thereafter the niching algorithm is executed for the constant number of n g generations on the increased population (line 6). After that, potential new niches are identified by a nearest neighbor based algorithm as seed points {s i } with associated radii {δ i } of the niches (line 9). This algorithm is described below (Algorithm 2). Already determined maxima, stored in X, are used by the algorithm to exclude non-promising seed points. We call the algorithm to detect new seed points (and the associated radii) the seeds algorithm. For each seed point the local solver is called (line 12) and the (local) maximum is added to the set of maximizers (line 14). The while loop is ended and the result is returned when the maximum number of function evaluations is exceeded (line 7 and 13).
The algorithm depends on a number of parameters. In addition to internal configuration parameters of the niching algorithm (such as mutation and crossover probabilities, etc.) and the local solver (tolerances etc.), Algorithm 1 requires the specification of the four essential parameters N , n p , n g and n nb . N is the maximum number of function evaluations and serves as the stopping criterion. Typical values are ranging from 10 4 for simple problems to 10 6 or more for complex problems. The value must be provided by the user. For the other three parameters, we can make good, problem-independent choices. Together with problemindependent choices for the internal parameters of the niching algorithm and the local solver, this is what we mean when claiming that we can formulate a parameterfree algorithm. The choices for the three parameters n p , n g and n nb are discussed in section 2.5 below.

Niching algorithm
The niching algorithm is used to simultaneously approximate niches. To this end we use a deterministic crowding genetic algorithm (Mahfoud, 1995). Per generation, it performs the standard operations of parent selection, recombination, mutation and survivor selection. As parents, all members of the population are chosen and mate with uniform probability (panmictic mixing). The recombination is performed with 30% probability using bounded simulated binary crossover with crowding degree η = 10. Afterwards, mutation is performed with 30% probability using bounded polynomial mutation with crowding degree η = 10. For details, see (Deb and Kumar, 1995;Fortin et al., 2012). At the end of each generation, deterministic crowding is used for the competition of the parent and the offspring population.
In Algorithm 1, the niching algorithm is executed for a fixed number of n g generations. This leads to a sufficient convergence of individuals towards different niches, where the number of individuals in the niches corresponds approximately linearly to the size of the niches. The number of individuals in the niches is not expected to depend significantly on the value of the associated maximum, see e.g. (Butenko et al., 2017;Eiben and Smith, 2015). Regarding the choice of n g , we refer to the discussion in Section 2.5.

Seeds algorithm
In Algorithm 1, line 9, the seeds algorithm is used to detect new niches. It returns a number of so-called seed points, one for each potentially new niche. They are found as local maxima in the current population of the niching genetic algorithm. An approximate niche size is determined associated with each seed point. The algorithm is based on nearest neighbor detection and depends on essentially one parameter, the neighborhood size n nb . The algorithm is given below as Algorithm 2.

Algorithm 2: Seeds
Function: Seeds Input: objective function f , population p, neighborhood size n nb , maxima X Output: set S = {(s i , δ i )} of seed points s i with niche radius δ i 1 //select potential seeds as local maxima The first loop of the algorithm (line 3) identifies individuals, which are maxima in a local neighborhood (line 7). The local neighborhood is formed by n nb neighbors and the 'mirrored' points 2p k − p nb (line 5 and 6).
Note that this neighborhood in general consists of the central point under consideration (p k ) and additionally 2 n nb − 2 points. The maximum distance from p k to one of the points serves as an estimate for the associated niche size. In the second loop (line 14), the potential seed points are filtered. Points, which lie too close to already known maxima (X) are excluded (line 16). The closeness criterion is related to the estimated niche size.
For keeping the description simple, we have used a number of intuitive notations in Algorithm 2: in line 4 p[k] refers to the k-th element of set p in an arbitrary enumeration, in line 6 (2p k − p nb ) denotes the set {2p k − q | q ∈ p nb }, in line 7 f (p nb ) denotes the set of function values of f for each point in p nb . Correspondingly to be understood are the expressions in line 15 and line 16. Note also, that the meta-code given in Algorithm 2 is for illustration purpose. An efficient implementation in terms of number of function evaluations, would be implemented differently.

Local solver
We use the covariance matrix adaptation evolution strategy (CMA-ES) as a local solver. The local solve for the niche identified by (s i , δ i ) is initialized with the seed point s i and a spheric centroid of radius δ i /3. We use the Python imlementation by N. Hansen et. al. (Hansen et al., 2019) with default configuration.

Critical parameters
In this section we discuss the critical parameters n p , n g and n nb and how to choose them for good performance of the overall algorithm. A summary of the parameters and its values is given at the end of this section in Table 1.
Population increase n p . The parameter n p sets the initial population of the niching genetic algorithm and its further increment in every iteration of the while-loop in Algorithm 1, line 2. The choice of n p should allow for the detection of a reasonable number of new maxima in every iteration of the while-loop. As a number of n nb individuals is necessary to establish one seed point (see Algorithm 2, line 5), a small multiple of this number is a good choice. From our experience a factor five or higher leads to good results, where the exact choice is not critical. In this paper, we make the following generic choice n p = 10 n nb . (2) Number of generations n g for the niching algorithm.
The aim of the niching algorithm is to approximate niches. To this end a significant contraction of individuals towards the maximum of each niche needs to be accomplished. Numerical investigations have shown that 20 generations lead to approximately a contraction of 0.3 − 0.8, dependent on the problem, see Section 3. The contraction rate is measured in a statistical sense. With n g = 20, Algorithm 2 yields good seed points in terms of their number versus duplicate solutions obtained by the local solver. Experiments with n g ≤ 10 lead to rather poor seed points, n g ≥ 40 does not improve the quality of the seed points significantly as to balance the additional effort. We therefore take the generic choice n g = 20. (3) Size of neighborhood n nb in the seeds algorithm. The choice of n nb is crucial for the performance of the entire algorithm. A too small size will lead to too many seed points, initiating too many local solves which either often do not converge or converge to already known maxima. A too large neighborhood size will lead to few seeds. This will cause few local searches and correspondingly to detection of only few maxima. To compensate for that, the population size of the niching genetic algorithm would need to increase much more, leading to an inefficient algorithm.
A smooth objective function f can locally be approximated by a quadratic function, which in turn gives information about the location of the local maximum (if it exists). It is therefore reasonable to relate the neighborhood size to the amount of information needed to approximate f by a quadratic polynomial. The minimum required number is therefore given by the degrees of freedom a quadratic polynomial in d dimensions which is our generic choice. For the detection, we add the 'mirrored' points 2p k − p nb (Algorithm 2, line 6). This is to avoid generating seeds lying at the border of a cloud of individuals, likely located a the slope of a niche and potentially far away from its maximum.

Experiments
We evaluate the proposed algorithm on the CEC2013 niching benchmark suite (Li et al., 2013) and a multimodal optimization problem arising from constraint optimization in climate research. For the CEC2013 niching benchmark suite the locations of the absolute maxima are known. For these problems, we also investigate the convergence of the niching algorithm (crowding GA), justifying the choice of (3).

CEC 2013 niching benchmark suite
The CEC2013 niching benchmark suite consists of 20 problems with dimensions ranging from 1 to 20. The problems are listed in Table 2. They are to be solved within the given budget of function evaluations. For each problem the solution procedure is called 50 times and the so-called average peak ratio is reported, the peak ratio being the fraction of the global maxima detected. For details, see (Li et al., 2013).
The performance of Algorithm 1 on the benchmark suite are given in Table 3. The average peak ratio, the average number of function evaluation and an average contraction number (see below) is given for each of the benchmark problems. The performance of the algorithm (average peak ratio) is good for all problems, with the exception of the Vincent problems (problem 7 and 9) and the 10D and 20D problems (problem 18 -20).
A comparison of the proposed algorithm with the algorithms from the most recent international competition on this benchmark suite (GECCO 2020) is given in Table 4. We compare the results according to Scenario I. For details, see (Li et al., 2020). The proposed algorithm would rank number 4 compared to the competing other 7 algorithms.
The poor performance of our algorithm on the Vincent problems and on the higher dimensional problems stems from the small basins of attraction of a number of maxima.
Convergence of the niching algorithm. For the CEC-2013 benchmark suite all global maxima are known. With this knowledge, we can define a contraction number ξ ∈ [0, ∞[ measuring the speed of convergence of the niching genetic algorithm. In essence, it describes the average movement of individuals in the population towards their nearest global maximum (in the Euclidean norm) over a number of generations, in our case n g . A contraction number ξ < 1 corresponds to convergence, a contraction number ξ > 1 to divergence. The contraction number is best desribed algorithmically; we give a pseudo code with Algorithm 3. As earlier, we use a simplified, intuitive notation to keep the description   simple. The sets S, P 0 , P 1 in Algorithm 3 have an order, where the order of S and the order of the partitioning subsets of P 0 , P 1 correspond. We use indexes to denote access to the element in this order. Finally, the order within the partitioning subsets correspond to the increasing distance to the corresponding elements in S. As we use a clustering with respect to the Euclidean distance, the contraction number is strictly speaking only meaningful for multimodal problems with isotropic, equally sized basins of attraction. We measured contraction numbers for the benchmark problems, which vary from roughly 0.3 to 0.85. With a contraction of ξ ≤ 2/3, a good clustering around the associated maxima is achieved, sufficient for a clustering based seed detection. This is true for 11 the benchmark problems. For another six benchmark problems, this level of contraction is reached after a second execution of n g iterations, and for the remaining three it is reached after a third execution of n g iterations. Though this argument is rather simplistic, we believe that it in essence captures the important feature of the niching algorithm. We consider this as the first explanation, why a generic choice of n g = 20 leads to successful overall behavior of the proposed algorithm.

Constraint optimization in climate research
At the heart of current investigations about possible pathways of expected future global warming is the Coupled Model Intercomparison Project (CMIP). It started in 1996 with a handful global coupled climate models (CMIP1) and has in the current phase (CMIP6) more than 50 models producing future climate projections. Despite a lot of progress, inter-model uncertainty of these future projections is still relatively high, depending on the variable and region considered. The evaluation of these models is challenging due to their complexity, sparsity of observations and unknown future  dynamics. The emergent constraint approach, see e.g. (Hall et al., 2019), allows the use of present-day measurements to constrain future spread in the model ensemble. At the core of this concept lies the idea that certain present-day dynamics will influence the future projection of a model. Collectively, the systematic bias in the present-day dynamic among the models can be used to constrain the spread in the future projections through cross correlation across the model ensemble.
Climate scientists have spend a lot of efforts into finding such emergent constraints for certain areas and processes. However, in previous attempts, the investigated areas are often pre-defined and the effort of the investigation is put into finding highly correlated dynamics. Here, we follow the opposite approach, where the correlated dynamics are already determined, and we try to optimise the area of the present-day dynamics used in the emergent constraint. The correlated dynamics that we consider are the spatially-averaged modelled scalar value of the anthropogenically altered future CO 2 -uptake over the North Atlantic (here denoted as Y) and the present-day modelled percentage of deep ocean CO 2 -storage (here denoted as X). The details of this are beyond the scope of this paper and we refer to (Goris et al., 2021) for further details.
We describe here the related mathematical optimization problem. The aim is to identify regions e, Fig. 1 The area under consideration lies in the North Atlantic ocean between America to the west, and Europe/Africa in the east, restricted to latitudes between 0 • and 75 • N, from the sea surface to the sea bottom. which optimize the cross-correlation between Y and X(e). These regions are supposed to lay within a 3Ddomain, the North Atlantic ocean between America to the west, and Europe/Africa in the east, restricted to latitudes between 0 • and 75 • N, from the sea surface to the sea bottom, see Figure 1. Here, the data of X is available on a longitude-latitude-depth axisparallel grid with grid spacing 1 • × 1 • × 100m. Only cells with a model-variance larger than the spatiallyaveraged model-ensemble standard deviation of 4.84e − 08 are considered, so that the optimal region can indeed be used to constrain model spread. We denote this set of cells by G and seek the set of cells (a subset of G) which maximizes the correlation (in terms of the Pearson-correlation coefficient) of the randomly picked 11 model's average X and their Y. In order to regularize this ill-posed problem, only areas resembling ellipsoids are considered. The set of these discrete ellipsoids E ⊂ 2 G is defined as the image of γ : E → E, e → {c ∈ G | midpoint(c) ∈ e}, where E denotes the set of all ellipsoids (subsets of R 3 ). We can now define the objective function of our multimodal optimization problem. Let where ρ X,Y denotes the Pearson correlation coefficient and X, Y ∈ R 11 is the present percentage of deep ocean CO 2 storage and future average anthropogenically altered CO 2 -uptake X = X(e) = (X 1 (e), . . . , X 11 (e)) with X m (e) = ∑ t,c∈e where t is running in monthly steps over the time period [1999,2009] (the decade surrounding the available observations normalised for the year 2004) and m = 1 . . . 11 is an arbitrary enumeration of the climate models here under consideration. Correspondingly, Y is defined by its components where t runs in monthly steps over the time period [2090,2100]. When equipped with Steinhaus distance d(e 1 , e 2 ) := |e 1 △ e 2 | |e 1 ∪ e 2 | , ∀ e 1 , e 2 ∈ E, the set of discrete ellipsoids E ⊂ 2 G becomes a finite metric space (△ denotes the symmetric difference and | · | the counting measure). In order to use Algorithm 1 to find the maxima of objective function f (5), we choose to describe an ellipsoid e ∈ E by a real-valued genome. Ellipsoids in R 3 have 9 degrees of freedom. We choose the following representation where x ∈ R 3 is the center of the ellipsoid, S ∈ R 3×3 is an antisymmetric matrix, D ∈ R 3×3 is a diagonal matrix and † denotes the Moore-Penrose inverse. We encode (x, S, D) as ξ ∈ R 9 and use the above defined map γ to definẽ Due to the nature of γ, the objective functionf is piece-wise constant. The length scale of variation off is typically 10 −4 , so that the function assumes likely ≥ 10 30 different function values in [0, 1] 9 . On larger scales (δx ≈ 10 −2 ) the function resembles a smooth, continuous function. It turns out that it has many (local) maxima, many of them clustering together. To find all or a major part of them therefore requires a efficient multimodal optimization algorithm, as the one presented with Algorithm 1. (Strictly speaking, for piecewise constant functions, such asf the notion of local or global maxima is problematic. Normally, the set of local maxima equals the entire domain with the exception of a set with measure 0 and the set of strict local or global maxima is empty. Only the set of (non-strict) global maxima is non-trivial. We disregard this aspect asf Table 5 Statistics of the multimodal optimization off (6). Each row corresponds to an iteration of the while-loop (Algorithm 1, line 2). The population is increasing linearly by 550 each iteration, the number of maxima (local or global) found increases from 1 after the first iteration to 187 after the ninth iteration. In addition the cumulative number of function evaluations is shown. The last column shows the efficiency, the percentage of function evaluations used in local solves as part of the overall number, a measure for the efficiency of the multimodal algorithm. We use Algorithm 1 with the standard configuration for a 9-dimensional search space: n p = 550, n g = 20 and n nb = 55 and allow for N = 10 6 function evaluations. The optimization executed nine iterations of a successively increasing population (Algorithm 1, line 2) with an initial population of 550 and a final population of 4950. A total of 187 local maxima have been found. About 25% of the function evaluations were used to find the starting point for local solves (seed points) and 75% were used in local solves by the CMA-ES algorithm. We call this the efficiency of the multimodal optimization. Table 5 gives the statistics of the optimization run.
Becausef is piece-wise constant, the local solver (CMA-ES) identifies a number of saddle points as local maxima. A random testing of 1000 points in a 10 −1 -Euclidean neighborhood is used to detect the saddle points (the search space is normalized to [0, 1] 9 ). After eliminating spurious maxima a number of 170 maxima remain. The 10 best solutions have objective function values η 0 = 0.9819, η 1 = 0.9809, η 2 = 0.9792, η 3 = 0.9778, η 4 = 0.9769, η 5 = 0.9768, η 6 = 0.9756, η 7 = 0.9749, η 8 = 0.9744, η 9 = 0.9705 corresponding to ξ i , i = 0, . . . , 9 ∈ R 9 ellipsoids. Since the representation of the ellipsoids in R 9 and its metric is not natural (e.g. the map γ • e is not injective, two different ξ can represent the same discrete ellipsoid), it is appropriate to investigate the maxima in the metric space (E, d) of discrete ellipsoids as defined above. In order to visualize the first 10 ellipsoids in terms of their distances, a 9 dimensional Euclidean space would be necessary. If we allow for a controlled uncertainty in the display, we can display the metric relationship between the first 10 ellipsoids in the Euclidean plane, see Figure 2. In the figure, solutions are represented as circles and their corresponding distances in (E, d) lie in the interval of minimum and maximum distant points in corresponding circles. It can be seen that the solutions 0, 1 and 2 (and possibly 6) form one cluster, the solutions 4, 5 and 7 form another cluster, and the solutions 3, 8 and 9 are clearly separated. A discussion of the corresponding ellipsoids im terms of climate related constraint optimization problem is beyond the scope of this paper. For further details, we refer to (Goris et al., 2021).

Discussion
The aim of this work was to design a widely applicable black-box algorithm for continuous, multimodal optimization problems of moderate dimension. We chose a hybrid algorithmic approach of (1) identifying basins of attraction and (2) carrying out local solves. For (1), we chose a standard panmictic genetic algorithm (deterministic crowding) with nearest neighbor based detection of basins of attraction in order to avoid a priori assumptions on size and shape of the basins. In order to cope with black-box objective functions, which may have an arbitrary number of maxima, the genetic algorithm is used with a successive increasing population. Local solves (2) are carried out with the standard CMA-ES solver. The key to a successful optimization algorithm is the proper balance of the efforts of global exploration and local solves.
It turns out that a proper balance can be achieved by carefully choosing three parameters, two configuring the panmictic deterministic crowding algorithm (population size, number of generations) and one specifying the identification of the basins of attraction (number of individuals required per basin). These parameters can be chosen dependent on the problem dimensionality. The algorithms is efficient for a large variety of problems from low to moderately high dimensions. Exceptions from good performance are (a) anisotropic problems and (b) higher-dimensional problems with large amounts of local maxima. We believe that this can be attributed to the large differences in size of the basins of attraction in the case (a) of the anisotropic problems. In the case of the higher-dimensional problems (b), these differences play certainly a role as well, together with the very large number of local maxima for the high dimensional problems.
In summary, we state that our design choice is successful and that the algorithm can be valuable for a larger number of applications.

Conclusion
In this work we presented two-phase multimodal optimization scheme built upon the well-established deterministic crowding GA and CMA-ES. It identifies niches through a simple use of nearest neighbors. The scheme is flexible enough to capture niches of various shapes and sizes. Robustness is achieved through a sensible balancing of efforts of exploration and exploitation. The settings can be chosen such that the algorithm becomes essentially a black box algorithm without a need for tuning parameters.

Declarations
Funding, Support This work was supported by the Research Council of Norway through the project COLUM-BIA (grant number 275268). The simulations were performed on resources provided by NORCE Norwegian Research Centre AS.

Conflict of interest
The authors declare that they have no conflict of interest.
Human and animal rights This article does not contain any studies with human participants or animals performed by any of the authors.
Availability of code, data and material In preparation.