Spreading Points Using Gradient and Tabu

Large spreading (packing) problems are hard to be solved exactly. Consequently, heuristic approaches are usually used to find approximate solutions. In this paper, a hybrid heuristic method, consisting of a Neighborhood Tabu (NT) and a Perturbed Gradient (PG), is proposed for solving spreading points problems. NT applies an adaptive random technique to selecting promising candidate solutions from the neighborhood of the current local minimum and PG uses a neighborhood-based perturbed gradient algorithm to seek a better minimum, starting from the candidates. Two procedures work alternatively and cooperatively. When applied to some previously studied packings, the proposed method can improve the accuracy of the previous record solutions.


Introduction
Spreading (Packing) points in a square or circle (or on a unit sphere) is a classical problem in geometry [1]. It can be modelled as spreading a given number of points into a predetermined shape region (including sphere surface) so that the minimum pairwise distance is greatest [2]. An equivalent problem, called packing problem, is the following: place a given number of congruent circles (spheres) without overlapping into the predetermined shape region maximizing the radius of the circles (spheres) [3]. Figure 1 shows a packing of 11 points (left) and a packing of 11 circles (right) in a square. They are equivalent and the points in the left match exactly with the circle centers in the right. For spreading and packing, we indiscriminately use both in this paper.
Spreading points (or packing circles) has a wide variety of applications including facility dispersion, wireless communication network layout, and test cases generation [2][3][4]. Take facility dispersion as an example, undesirable facilities such as ammunition dumps and oil storage tanks need to be spread out to the greatest possible extent in order to avoid an accident at one of the facilities damages any of the others. In coding terminology, the points on the surface of the unit sphere are known as Spherical Codes (SC). The distribution of points on the unit sphere has also many applications, including number theory, compressive sensing, machine learning, and quantum information theory [5,6]. However, these problems, in general, are nonconvex optimization problems that cannot be solved effectively by purely analytical approaches [3]. Deterministic methods, for example, branch-and-bound approach [7,8], are good only for small problems. Hence some stochastic methods are proposed to find approximate configurations. These methods, such as billiard simulation [9] and simulated annealing [10], are usually heuristic.
The evidence shows that combining multiple heuristic algorithms for generating packings (spreadings) can achieve substantial improvement over using either one alone [9,11,12]. Combining local search with some global heuristic procedures is becoming more popular [13,14].
Nurmela and Östergård [12] introduce an energy function and the problem (disk packing in a square) is transformed into minimizing the energy function. During the optimization, a hybrid algorithm, consisting of a simple steepestdescent algorithm and a multistart approach, is used to solve the problem. Addis et al. [15] applied basin-hopping coupled with local search to the problem of packing disks in the unit square and improved many instances over previously known putative optima. In basin-hopping [16], the energy function is transformed into a set of interpenetrating staircases, each staircase corresponds to a local minimum of the energy function, and search is performed successively from one local minimum to another better one. Similarly, M'Hallah et al. [17] applied a variable neighborhood search (VNS) coupled with a local search (a Non-Linear Program solver, NLP solver) to the problem of packing unit spheres into the smallest sphere and improved 29 (out of 48) benchmark instances.
In general, local search is to use gradient-type methods. However, the energy function or the NLP problem is very complicated and has enormous or even uncountable stationary points. These stationary points are not necessarily a local optimization [17]. As a result, even solving a local optimization becomes a very hard task. On the other hand, global heuristic procedures are usually based on random or Monte Carlo simulation and do not consider by-passing previously converged local minima. In this paper, we model spreading points problems originating from a minimax-type function and propose a neighborhood-based gradient algorithm with perturbation and momentum in hope of moving across flat spots in the search space. When the gradient algorithm falls into suboptimal regions, Tabu search helps it escape from the already visited solutions and their neighbors. Experimental results showed that the proposed hybrid approach can improve the accuracy of some previous record solutions.

Basic Definitions
In d dimensional Euclidean Space, we use a capital letter such as X to denote a point set; a lowercase letter with or without a subscript such as x, x 1 ∈ X to denote a point or vector; and a lowercase letter with two subscripts to denote coordinates, for example, x i,j denotes the jth coordinate of the point x i . We also use Greek alphabet with a subscript such as i to denote the ith component of the vector when no ambiguity is possible. Additionally, |X| denotes the cardinality of X, First let's start with a more general statement for spreading points problems.
Let Ω be the domain of the spreading points problem (SPP for short). The domain Ω can be a closed convex set or a manifold. The closed convex set includes rectangle, cuboid, square, cube, and hypercube. And the manifold includes the surface of a sphere or a hypersphere. The SPP is defined as follows: scatter a set of n points X in Ω such that the minimum distance d min (X) ( d min for short) among points is maximized. The definition can be formulated as: Given a domain Ω of a SPP and a set of n points X = x 1 , x 2 , ..., x n in d dimensional Euclidean Space ℝ d , X is a feasible solution or configuration of the SPP if x i ∈ Ω, i = 1, ..., n.
Let dist(x 1 , x 2 ) denote the Euclidean distance between two points x 1 , x 2 ∈ ℝ d and dist(x, X) denote the distance between a point x and a point set X, which is defined as dist(x, X) = min y∈X dist(x, y) . If x ∈ X , we also use dist(x, X) to denote the distance to the nearest neighbors of x from x (referred to as the nearest distance of x) when no ambiguity is possible, i.e., dist(x, X) = dist(x, X�{x}) . The distance between two point sets dist(X, Y) is defined as follows.
Definition 1 (k -matching and distance). Given two point sets X and Y with any integer k ∈ [1, min{|X|, |Y|}] , define a k − matching as a set of k edges such that each edge has one endpoint in X and one endpoint in Y and for any point p ∈ X ∪ Y , at most one edge is incident on p. Each edge has a value that represents the distance between the two endpoints of the edge and the value of a k − matching is defined as the average of the k edges, i.e., the sum of the values of the edges divided by the number of edges k. Given a k, we define the distance between X and Y as the minimum value in all k − matchings.
We use dist(X, Y) to denote the distance between X and Y with k = |X| = |Y| . Computing the distance is equivalent to finding a minimum weight perfect matching in a bipartite graph. A low bound for dist(X, Y) is defined as Let be a set of point sets and X a point set with |X| = |Y| for ∀Y ∈ , we define the distance between X and as dist(X, ) = min Y∈ dist(X, Y).

The Hybrid Approach
The proposed approach is a hybrid approach, which consists of a Neighborhood Tabu Search (NTS) and a Perturbed Gradient with Momentum (PGM) algorithm.

Gradient with Perturbation and Momentum
An approach to generate a local solution is maximizing dist( The approximate form is differentiable and has the following properties: The goal is to maximize dist(x i , X;s = ∞) , which can be optimized via gradient ascent. The gradient of dist(x i , X;s = ∞) with regard to x i ∈ X is computed by Hence for every point, we obtain a gradient direction in (1), which is the sum of vectors from its nearest neighbors to the point.
and ∇f (X) be a matrix, each row consisting of a gradient direction of a point computed by (1), we scale ∇f (X) by some > 0 (called learning rate) and take X ← X + ∇f (X) to locally maximize the function f(X). Adapting for gradient optimization can increase performance and the most popular adaptation during optimization is reducing over iterations. For SPP, we propose a gradient algorithm with perturbation and momentum to maximize f(X) (referred to as PGM, see Algorithm 1).
The PGM algorithm starts with a random feasible solution X and tries to improve the solution using stochastic gradients. The stopping criterion of the procedure is based on the number of iterations T and the minimum learning rate tolerance min . During iterations, if the objective function of the feasible solution is strictly more than that of the current best solution (controlled by > 0 ), the best solution is replaced and the counter k is reset to 0. When the solution has not been improved for K (a predefined number) iterations, the learning rate decays by a factor . In order to avoid getting stuck in a poor local minimum, a slight random perturbation (noise) is immediately added to the gradient. For each x i ∈ X , the perturbation is based on the gradient and the nearest distance of the point and is modelled as perturbed( where rand() denotes randomly generating a number from the interval [−1, 1] and 1 and 2 are control parameters (usually 1 , 2 ∈ (0, 1) , 1 = 0.01 and 2 = 10 −6 ). At each iteration, the maximum moving distance of each x i ∈ X is limited to less than 10% of the nearest distance of x i (in Clip(G)). Momentum is a widely used method in machine learning community. It allows the search overcomes the oscillations of noisy gradients and smoothly moves across flat spots in the search space that have zero gradient or no gradient. Hence we consider PG with momentum. If a point violates the constraint, it will be adjusted. For example, in a box container, if a coordinate violates the lower (resp. upper) bound of the coordinate variable, it will be reset at the lower (resp. upper) bound; whereas on a unit sphere, if a point does not locate the unit sphere, the point (vector) will be normalized.
According to Property (ii), an alternative approach is: first f(X) is maximized by gradient with a small s value, then repeatedly increase the value of s when f(X) does not significantly improve (getting stuck in a local minimum), and finally the gradient computed in (1) is used to maximize f(X).
Additionally, the Minimum Enclosing Ball (MEB) problem can be viewed as a degenerate spreading point problem [18]: spreading a point x (denoting the center of a point set X) such that the maximum distance d max = f (x) = max y∈X dist(x, y) is minimized (a special space-filling design problem). Let f (x, s) = 1 s log ∑ x i ∈X e s��x−x i �� and F ⊆ X consist of the points that are farthest away from x, then the gradient direction of f (x, s = ∞) with regard to x is computed by ∇f (x) = 1 �F� ∑ y∈F (y − x) . Let x * and r * be the center and radius of the MEB of X. Given > 0 , an approximate algorithm for the MEB with an early stopping criterion, similar to [19], is the following: initially choose a vector 0 ∈ Δ n−1 and hence center c 0 = ∑ n i=1 0 i x i ; at step k find the points F ⊆ X farthest away from c k , update k as follows: where the step size k usually decreases over time, e.g., k = 2 k+2 [20], and then c k+1 = ∑ n i=1 k+1 i x i ; and repeat the above step until f (c k ) ≤ (1 + )D( k ) 1 2 or the predefined maximum number of iterations is reached, where D( ) is concave and denotes the dual function of The gradient-type algorithm to packing already exists in the literature, such as energy minimization [12], subgradient-type algorithm [21] and steepest (subgradient) ascent [22]. In these algorithms, either only a pair of points with minimal pair distance or all pairs of points are considered for the construction of a gradient at an iteration. We have derived a formula for calculating the gradient of SPP from a different perspective and in our gradient algorithm, the gradient of every point is computed based on their nearest neighbors and all gradients are simultaneously computed.

Neighborhood Tabu Search
The PGM converges to a good local optimum. To further obtain a better solution, even a global optimum, heuristics are applied to explore the search space. Some global optimization algorithms, such as basin-hopping [16] and variable neighborhood search [17], consider there exist tunnels from the current local minimum to the global minimum. Each tunnel consists of a monotonically descending sequence of local minima, and the search algorithm jumps from one minimum to the other, an adjacent lower local minimum, finally to a global optimum. These global optimization algorithms are usually a hybrid method, consisting of a neighborhood search and a local search. The neighborhood search selects promising candidate solutions from the neighborhood of the current local minimum and the local search (e.g., PGM) makes them evolve to local minima. However, the neighborhood search explores the solution space usually using random or Monte Carlo technique, which does not prevent the search from backing to and getting stuck in already visited feasible solutions.
Hence we apply tabu to neighborhood search [23]. The neighborhood tabu prohibits the selection of already visited feasible solutions and their neighborhoods. The simple version mainly includes the following iterated steps: (i) sample a fixed size of candidate solutions, (ii) select the one with the maximum distance to feasible solutions in the tabu list, and (iii) update the tabu list according to the selected feasible solution. Intuitively speaking, the neighborhood tabu can provide more evenly spreading feasible solutions over the search space and hence help to efficiently find a better solution. The pseudo codes of the neighborhood tabu search is given in Algorithm 2.
The procedure randomly generates k feasible solutions from neighborhood of X best using procedure shake(X best , ) where X best is also a local minimum, selects the solution X that has the maximum distance to the tabu list computed by dist(X, Γ) , and then applies PGM to X, leading to a local minimum X min . If X min improves X best , the current best solution is updated. If X min does not improve the current best solution, then X, even X min (when the distance to X is less than 0.1 ⋅ d min (X min ) ), is added to the tabu list and then the steps are repeated until either a better solution is obtained or the predefined maximum number of attempts is reached. The procedure shake(X best , ) is defined as for every x i ∈ X best where is randomly sampled from the interval [0.1, 0.6]. After the shaking, the neighborhood of a point is still partially preserved with a very high probability. The NTS can be repeated iteratively if possible. As a result the solution is improved continuously.
However, adopting random technique in neighborhood search is not easy to find better candidate solutions, especially with the increase of number of points and dimensions. We propose a Shrinking and Tabu (ST) method as a supplement of neighborhood tabu search, which can efficiently generate some better candidate solutions.
Intuitively, if nearest distances of both x i and its nearest neighbors are larger than d min , then appropriately shrinking the nearest neighbors of x i toward x i does not decrease d min . Although movements of these points do not change d min , the movements make space for movements of other points. Hence, after shrinking, if the points and their nearest neighbors are forbidden to move over the next several iterations, the points with smaller nearest distances have a chance to increase their nearest distances.
The heuristic shrinking and tabu procedure can be described as follows: (i) find the point, say x k ∈ X , that is not in the tabu list Λ and has the maximum nearest distance with d k > (1 + )d min , where denotes minimum excess ratio and is set 10 −3 by default. If exist such a point, add the point to the tabu list; (ii) for each neighbor x j ∉ Λ of x k , check whether x j can move toward x k under the condition that its nearest distance does not reduce to d min . If the nearest distance of x j is larger than d min by x j ← x j + (x k − x j ) , where denotes shrinkage rate and is set 0.5 by default, then update the location of x j and add x j to the tabu list, continue to handle the next neighbor otherwise; (iii) repeat the above steps until the maximum size of tabu list is reached or there does not exist such a point that satisfies the given conditions.
Tabu strategy can also be applied to the multistart global optimization in whole search space. The tabu technology selects a subset of configuration space that has a more even distribution than does random technology; then the local minimization algorithm is performed at each point in Tabu to arrive at the corresponding local minimum; finally the best local minimum is selected as the putative global minimum.

Computational Results
In order to check the effectiveness of our method in large problems, we apply our algorithm to two kinds of spreadings (packings), namely, in a unit square and on a unit sphere in 5 dimensions. In all problems, the common settings are as follows: in the PGM procedure, the initial learning rate = 0.1 and the minimum learning rate min = 10 −5 , the improvement tolerance = 10 −6 , the perturbation parameters 1 = 0.01 and 2 = 10 −6 , the momentum parameter = 0.9 and the decay rate = 0.9 ; in the NTS procedure, the maximum number of attempts escaping from the current local minimum is set to 15, the number of feasible candidates generating from neighborhood of the current best solution in procedure shake is set to 15 (including a candidate generated by ST) and = rand(0.1, 0.6) . As stated earlier, deterministic methods can achieve the optimal solution for small problems. Hence we consider larger problem size (and dimension) in our experiments. In a unit square, a database of putative optima for these spreadings (packings) is currently maintained at website Packomania (www. packo mania. com). We check the last 5 packings for the problem size n < 1000 in Packomania, namely, 990, 991, 992, 998 and 999, and generate 5 new packings, size from 993 to 997, which are not available from Packomania. The number of total iterations T, including in PGM and in NTS, is set to 2 ⋅ 10 6 and the maximum attempts for a learning rate K is set to 2 ⋅ 10 4 . For all five problems, we achieve improvement over previously known putative optima (see Table 1). On a unit sphere in 5 dimensions, the database can be found at website Neilsloane (neils loane. com/ packi ngs/ dim5/). We check the last 10 packings (i.e., problem size n ∈ [121, 130] ) with T = 2 ⋅ 10 7 and K = 2 ⋅ 10 5 . We achieve improvement for 9 out of 10 instances except instance n = 122 (see Table 2).
In Tables 1 and 2, column 1 indicates the problem size; column 2 gives the previous reported best record (minimum pairwise distance); column 3 reports the improved results found by the proposed algorithm; column 4 gives the difference between the improved results and the previous record results; column 5 gives the distance between the new configuration and the previous best configuration; and N.A. means best known results are not available from Packomania. New packings and the check program can be downloaded from https:// data. mende ley. com/ datas ets/ jfcjk xwnxb/3.

Conclusion
This paper proposes a hybrid heuristic method to approximately solve the problem of spreading (packing) n points in a finite domain such that the shortest point-topoint distance is greatest. The method consists of a neighborhood tabu search and a neighborhood-based gradient (ascent) with perturbation and momentum. Tabu search acts as the diversification (or spreading) mechanism in neighborhood or whole domain whereas gradient serves as the intensification one. Experimental results show that the method is effective.
Combining Tabu with PGM, we obtain a global optimization algorithm, which is referred to as SPGM (Spreading Perturbed Gradient with Momentum). Future work includes (1) applying the proposed neighborhood-based gradient algorithm to some minimax-type or maxmin-type problems, e.g., minimax (incremental) space-filling designs [24] and the largest sample margin learning [6], (2) applying spreading to Particle Swarm Optimization (PSO) to alleviate premature convergence, and (3) applying SPGM to training neural networks in supervised or unsupervised learning, in which spreading neurons is considered as a form of diversity regularization and perturbation can help the solution escape poor local minima.ý Author Contribution XH and LH wrote the manuscript and XH performed the experiment.

Consent for Publication Not applicable
Conflict of Interest The authors declare are no competing interests.