Heterogeneous ant colony optimization based on adaptive interactive learning and non-zero-sum game

Ant colony optimization (ACO) is prone to get into the local optimum and has a slow convergence speed when it is applied to the traveling salesman problem (TSP). Therefore, for overcoming the drawbacks of ACO, a heterogeneous ant colony optimization based on adaptive interactive learning and non-zero-sum game is proposed. Firstly, three subpopulations with different characteristics are constructed into heterogeneous ant colony to enhance the performance of the ant colony. Secondly, the adaptive interactive learning mechanism is adopted when the algorithm diversity decreases, in which the objects to be communicated are selected adaptively according to the population similarity. In this mechanism, the way of communication is to pair the inferior individuals with the superior individuals, which enlarges the searching range and speeds up the convergence speed. Finally, an elite information exchange strategy based on non-zero-sum game is adopted when the algorithm falls into local optimum, in which each subpopulation selects the partners for elite information exchange according to the normalized comprehensive evaluation operator, which is helpful for each subpopulation to select the most appropriate strategy for getting out of the local optimal. Through this model, the accuracy of the solution is further improved. The data that used for this experiment is from the TSPLIB library under MATLAB simulation with various ranges of TSP datasets. Experimental results indicate that the proposed algorithm has a higher quality solution and faster convergence speed in solving the traveling salesman problem.


Introduction
The traveling salesman problem (TSP) was introduced in the early nineteenth century. The classic TSP can be described as: a traveler must travel through each city once with the shortest route. When the number of cities is small, we can use the enumeration method. But as cities get bigger, using exhaustive methods to solve problems becomes much more limited. So scientists gradually look for approximate algorithms or heuristic algorithms to find an acceptable optimal solution within a reasonable time range.
Meta-heuristic algorithm is applied to many difficult combinatorial optimization problems. The application of this algorithm improves the ability to find high quality solutions. Meta-heuristic algorithms are much more useful, especially for large problems. Ant colony optimization (ACO) (Dorigo et al. 1996;Dorigo and Stützle 2004) is a heuristic for difficult discrete optimization problems. ACO is good for optimization, and its process has a certain degree of randomness. This algorithm can not only solve static combinatorial optimization problems, such as TSP (Elloumi et al. 2014; Mollajafari and Shahhoseini 2016;Mavrovouniotis et al. 2017). It can also solve dynamic combinatorial optimization problems (Jin and Branke 2005). At first, ant colony optimization algorithm was used to solve the shortest path problem, and then gradually applied to other fields, such as vehicle scheduling problem, graph coloring problem, integrated circuit design, communication network, data clustering analysis, and so on.
The classical ACO algorithm includes ant system (AS), elitist strategy for ant system (EAS), rank-based ant system, max-min ant system (MMAS), ant colony system (ACS). Ant system was proposed by Italian scholar M. Dorigo et al. in the 1990s. The AS algorithm includes three versions: ant quantity, ant density, and ant cycle. At present, the AS algorithm we refer to is the ant-cycle version; the other two versions have been eliminated due to poor performance. The execution efficiency of the ant system is not good, so the first improvement is made based on AS algorithm: elitist strategy for ant system (EAS). The system adds pheromones additionally to the hitherto optimal path. Another improved version of the AS algorithm is the rankbased ant system, which is improved in the aspect of pheromone updating. Only the ants with shorter paths and the ants with the best global paths can update the pheromones. These two versions enable the algorithm to converge faster. Max-min ant system (MMAS) (Stützle and Hoos 2000) has made a big improvement on AS algorithm. This algorithm only allows the iterative optimal ants or the global optimal ants to release pheromones, which greatly improves the convergence of the algorithm. In MMAS, the path pheromone is set to an interval. The pheromone initial value is the upper limit of the pheromone value range, which makes the algorithm have a stronger ability of global optimization. Ant colony system (ACS) (Dorigo and Gambardella 1997;Kollin and Bavey 2017) is optimized in path construction and pheromone updating, and its performance is better than that of the previous improved algorithms.
In general, ant colony algorithm has achieved good results in solving TSP (Stutzle and Dorigo 1999). However, in large-scale problems, the ant colony algorithm still has the problems of poor diversity and slow convergence speed. Therefore, scholars made a series of improvements based on the ant colony algorithm to optimize the algorithm. Mostafa Mahi et al. (2015) presented a new hybrid optimization algorithm. In this algorithm, particle swarm optimization (PSO) is used to determine the parameters of the ant colony algorithm (ACO) because the parameters of ACO affect the performance of the algorithm. Then, 3-opt heuristic method is used to optimize the local solution to improve the quality of the solution. Gao (2021) proposed to improve the rules in ant colony algorithm. In the process of path construction, it is proposed to combine the paths of two meeting ants to construct the path, which reduces the searching time. The method of polarizing the pheromone density of all paths is proposed in the pheromone updating strategy to increase the precision of the solution. Boxin Guan et al. (2021) introduced an improved ant colony optimization algorithm (AU-ACO) based on adaptive update mechanism. The idea of the automatic update mechanism is to optimize the allocation without giving up the excellent variable-value pairs allocated. By optimizing partial pairs of variable values, not only the convergence is improved, but also the probability of the algorithm finding the optimal solution is improved. Zhang et al. (2020b) put forward a method based on oppositional ACO and oppositional learning (OBL). Two strategies for constructing the opposite path by OBL based on solution characteristics of TSP are also proposed. These strategies can effectively improve the performance of the algorithm.
However, with the expansion of the data set, the ant colony algorithm is easy to get stuck, and the program running time will be affected at the same time. Therefore, some scholars have improved the algorithm for large-scale cities. Wu et al. (2020) proposed a two-layer ant colony algorithm combined with k-means to solve large-scale TSP problems. The k-mean method is used to divide the total number of points into several clusters. The top layer ant colony algorithm optimizes the connection of cluster centers, and the bottom layer algorithm optimizes the path of the remaining points between clusters. The experimental results indicate that the proposed algorithm further improves the performance while shortens the running time of the program.
Considering the limitation of a single population, scholars introduce the idea of multiple populations into the ant colony algorithm. The concept of multiple ant colonies was proposed by Gambardella et al. (1999) and has since been extensively studied by scholars. Multi-colony systems (Jovanovic et al. 2010) use communication between groups to improve performance. Akpınar et al. (2013) proposed a hybrid algorithm based on ant colony optimization and genetic algorithm (ACO-GA). Genetic algorithm is introduced as the local search strategy to increase the optimization performance of the ant colony algorithm. In this literature, the ant colony algorithm is responsible for algorithm diversity, and the genetic algorithm is responsible for enhanced processing. Meng et al. (2020b) proposed a GAN model of multi-ant colony algorithm based on generative adversarial network. GAN model is composed of the discriminant model and generation model of multiant colony algorithm, and the relationship between convergence speed and solution quality is adjusted through the communication between each population. Deng Ye et al. (2018) proposed a multi-type ant system algorithm (MTAS) with a mixed ant colony algorithm and a maximum and minimum ant colony algorithm. In addition to retaining the advantages of ACS and MMAS in solutionseeking and optimization, this algorithm also adopts an adaptive pheromone updating strategy in the MMAS algorithm, which better balances the search ability and convergence. Zhang et al. (2020a) introduced a dynamic multi-role adaptive cooperative ant colony algorithm (MRCACO). The ant colony consists of three heterogeneous subpopulations. At the same time, the recommendation learning is carried out according to the attributes of each group, which realizes the complementarity in performance. In addition, the reverse learning strategy is used to find a better path.
In this paper, a heterogeneous ant colony optimization based on adaptive interactive learning and non-zero-sum game (AGACO) is proposed with a focus on improving performance, especially for large-scale TSP. The main contributions are as follows: The heterogeneous ant colony, which integrates the advantages of subpopulation ACS, MMAS, and improved DCACS, increases the performance through communication between the colonies. The game exists in the whole optimization process, and each subpopulation can be regarded as a player. In the initial stage, each player searches the optimal path independently. The goal of all players is to get the global optimal path, so players will agree to cooperate when the performance is poor. When the diversity is below the threshold, the adaptive interactive learning mechanism is triggered, in which the least similar population is selected for communication. The inferior individuals were paired with the superior individuals to equalize the path pheromone. At the same time, the dominant individual is strengthened to accelerate the convergence. When the population comes to a standstill in the optimization process, the elite information exchange strategy based on non-zero-sum game is triggered. In this model, the elite information exchange strategy is helpful for the population to avoid the local optimal path and find a better path. At the same time, partners are judged according to the normalized comprehensive evaluation operator.
The remaining of the paper is organized as follows: paragraph two presents the basic principles of ACS algorithm, MMAS algorithm, and non-zero-sum game; the proposed AGACO algorithm is described in paragraph three, including the calculation of population similarity, adaptive interactive learning mechanism, and elite information exchange strategy based on non-zero-sum game. The effectiveness of the strategy and analysis of the proposed algorithm are presented in paragraph four. The conclusion and the prospect are presented in paragraph five.

Construct the solution
All the ants were randomly distributed in different cities to construct paths in parallel. The next city to be traversed by the ant is selected according to the pseudo-random proportional rule, as shown in Eq. (1): where s ij denotes the value of the pheromone on the edge i; j ð Þ; g ij ¼ 1 d ij denotes the value of a given heuristic information; the size of the b value determines the role of the heuristic information in the ants' city selection;q denotes a random variable; and q 0 denotes a definite value between 0 and 1, which determines the degree of exploration of the new path. In other words, the probability of the ant choosing the current optimal city is q 0 , where J denotes a random variable as shown in Eq. (2): where a is the parameter that controls the influence of pheromones, and allowed denotes a collection of cities that have not yet been visited by ants.

Global pheromone update
Only the path traveled by the globally optimal ant is allowed to release pheromone. The pheromone update rules are shown in Eq. (3) and Eq. (4).
Ds bs ij ¼ where Ds bs ij denotes the release of pheromones; L gb is the length of the global-best path; and q is the evaporation rate of global pheromone update the size of which affects the convergence rate.

Local pheromone update
In addition to the above global pheromone updating, ACS algorithm also has local pheromone updating. The local pheromone update occurs during the path construction process. Each edge that the ant passes through will be Heterogeneous ant colony optimization based on adaptive interactive learning and non-zero-sum… 3905 updated immediately. Local pheromone updating rules are shown in Eq. (5): where n is the evaporation rate of local pheromone update; s 0 is the initial value of the pheromone; and the effect of local pheromone updating is to reduce the amount of pheromone on the edge that the ants walk on and increase the probability of the edge being explored.

MMAS
Max-min ant system (MMAS) is improved on the basis of AS algorithm. The path construction rules of MMAS algorithm are shown in Eq.
(2). Compared with AS algorithm, only the pheromone on the global optimal path will increase in MMAS. The pheromone evaporation rule is shown in Eq. (6), and the pheromone release rule is shown in Eq. (7): where Ds best ij ¼ 1 C bs ; C bs is the globally optimal path length when the ant releasing pheromone is the globally optimal ant. Or Ds best ij ¼ 1 C ib , the pheromone is released by the ant that is optimal in the current iteration. C ib is the optimal path length of the current iteration.
The pheromones on each edge are restricted in order to avoid the algorithm getting stuck. The upper and lower bounds of pheromones on each edge are shown in Eq. (8) and Eq. (9): where C Ã is the optimal path length, a is a parameter. When s ij s min , set s ij ¼ s min . Correspondingly, when s ij ! s max , set s ij ¼ s max .

Information entropy
In 1948, Shannon (1948) proposed information entropy to solve the problem of information measurement. Information entropy describes the uncertainty of a random variable.
For any random variable, information entropy is defined as follows: where P x ð Þ is the probability that event x will occur. The degree of systematization can be measured by information entropy. The more ordered a system is, the lower the entropy of information; On the contrary, the more chaotic a system is, the higher the entropy of information is. Information entropy is used to describe the diversity of algorithms in the literature (Chen et al. 2019). The more different solutions there are, the better the diversity of the algorithm.

Non-zero-sum game
Game is a kind of confrontation, which means that under certain rules, each player chooses his strategy to maximize his interests. A cooperative game emphasizes group rationality. The game is played by the players in a manner of alliance and cooperation to advance the interests of both parties.
A zero-sum game is a non-cooperative game, which means that the gains of one party must be the losses of the other party, and the sum of the gains and losses of all parties is zero. A non-zero-sum game is a cooperative game. For the two parties involved in the game, the gain of one party is not necessarily the loss of the other party, and the sum of the gain and loss of each party is not zero, which may lead to a win-win or lose-lose situation.
A non-zero-sum game can be either a positive or a negative-sum game. A positive-sum game is one in which the overall benefit increases. The interests of both parties in the game may increase, or at least the interests of one party may increase without the interests of the other party being harmed. A negative-sum game is a game where both players have a loss. The overall benefits of both parties in the game are reduced, or the benefits of one party are less than the losses of the other party. In this paper, the idea of non-zero-sum game is introduced into the ACO.
The game runs through the whole process of finding the optimal path, and each subpopulation can be regarded as a player. Participants compete with each other in pursuit of personal optimization, putting personal interests first.

Heterogeneous ant colony optimization based on adaptive interactive learning and non-zero-sum game
The traditional ant colony algorithm is easy to fall into the local optimum and the convergence speed is slow. Aiming at this problem, the DCACS algorithm proposed in the literature (Meng et al. 2020a) effectively improves the algorithm performance by adopting the synergistic mechanism, but it still needs to be improved in terms of solution quality and algorithm diversity. Therefore, this paper adopts multi-ant colony and multi-strategy approach to further improve the algorithm performance, in which the heterogeneous ant colony is composed of ACS population, MMAS population, and DCACS population. Zhang D et al. (2019) used the Jaccard coefficient to calculate the similarity between populations. This method makes use of diversity and convergence to form a vector. The larger the Jaccard index is, the more similar the population is. However, the Jaccard coefficient is only applicable to the case of binary attributes, and cannot fully measure the performance. This is because only diversity and convergence are taken into account in this coefficient, ignoring the quality of understanding. Moreover, the diversity and convergence of algorithms in the literature are relatively broad and lack representativeness. In general, the more identical segments in the path, the more similar the ants are. Therefore, the path repetition rate is considered to calculate the similarity. The process of modeling, from individual similarity to population similarity, is as follows:

Population similarity
(1). Assume that the path of the t À th ant in population A is a t 1 a t 2 a t 3 Á Á Á a t n Â Ã , and the path of all ants in this population is moment Route A , as shown in Eq. (11). Similarly, assume that the path of the t À th ant in population B is b t , and the path matrix of all ants in this population is Route B , as shown in Eq. (12).
where N denotes the number of iterations, n denotes the number of cities, and m denotes the number of ants.
(2). At the end of the iteration, the common path segments of Population A and B are calculated according to Eq. (13).
where same denotes three consecutive pieces of the same path.
(3). Obviously, the more common paths between populations, the more similar populations A and B are. The common path is normalized to make the similarity value between 0 and 1, so as to compare the similarity between different populations more conveniently. Set the zero matrices of size n Â n based on the number of cities. Then take the two adjacent cities in the common path as coordinate points, and set 1 at the corresponding coordinates of the all-zero matrix. The population similarity is calculated according to Eq. (14).
where n denotes the number of cities; S denotes the nonzero number in matrix n Â n. The more common paths there are, the more similar they are. When the paths are more concentrated, the diversity is poor. On the contrary, the path is more scattered and the algorithm diversity is good.

Adaptive interactive learning mechanism
Generally speaking, there is a certain gap between the pheromones of each path after a certain number of iterations. As the path of pheromone accumulation is gradually concentrated, the diversity of algorithms is poor. Usually, information entropy is used to judge algorithm diversity. When the information entropy is less than the threshold value, players reach a cooperative consensus to adopt the adaptive interactive learning mechanism to improve algorithm performance. This mechanism can increase the probability of other routes being chosen by the superior ant colony to drive the inferior one. At the same time, the superior population is strengthened to some extent, which speeds up the convergence speed of the algorithm.
The objects of interactive learning are adaptively selected according to population similarity. The two populations with the least similarity were selected to communicate for optimal performance. The interactive learning model is shown in Fig. 1.
In this mode, all individuals in population A are sorted in positive order according to the path length, and all individuals in population B are sorted in reverse order according to the path length, where m is the number of ants, and Route m is the path of the m À th ant. The disadvantaged individuals in population A learn from the dominant individuals in population B. Similarly, the inferior individuals in population B learn the pheromone distribution of superior individuals in population A to equalize the path pheromone and increase the path diversity. At the same time, the dominant individuals in the two populations are also strengthened to a certain extent. The essence of interactive learning is to update the path pheromone. At the same time, different weights are assigned according to the fitness of the learning object, that is, the path length. The superior individuals are assigned a smaller weight, while the inferior individuals are assigned a larger weight. Weight is calculated according to Eq. (15) and (16), and interactive learning is conducted according to Eq. (17) and Eq. (18).
where w A;t ð Þ and w B;t ð Þ denote the weight of the t À th ant in A and B populations, respectively; length A;t ð Þ and length B;t ð Þ are the path length of the t À th ant in population A and B, respectively; best length denotes the optimal solution for the current test set; k 0 denotes a constant; and s A;t ð Þ and s B;t ð Þ are the pheromones on the path of the t À th ant in population A and B.
Under the influence of interactive learning mechanism, on the one hand, the probability of the path being chosen is improved, which is helpful to improve the diversity of algorithms. On the other hand, the pheromone on the better path is enhanced to accelerate the convergence. The goal of balancing diversity and convergence speed is achieved. However, some of the enhanced optimal paths are prone to fall into local optimality. Therefore, the upper and lower limits of pheromones in the MMAS algorithm are integrated into the interactive learning mechanism to balance path pheromone.

Elite information exchange strategy based on non-zero-sum game
Improving the accuracy of the solution and accelerating convergence is the ultimate goal of every player. However, the algorithm tends to be at a standstill with the accumulation of pheromones. At this point, the participants hope to further optimize the performance through cooperation, so an elite information exchange strategy based on non-zerosum game is proposed.

Elite information exchange strategy
An elite information exchange strategy is used in heterogeneous populations when several generations have the same global optimal path, that is, the algorithm falls into local optimal path. The optimal ways of each subpopulation are different, so the distribution of path pheromone of each subpopulation is different. Exchanging elite pheromone matrices with other subpopulations helps the current subpopulation learn from other subpopulations. This method is helpful for subpopulations to find a more appropriate pheromone matrix and improve the precision of the solution.
For example, three heterogeneous populations of ACS, MMAS, and DCACS are selected for the elite exchange, and there are six cases in total, as shown in Fig. 2.
In case 1, the elite part of the MMAS ant colony is assigned to the ACS ant colony; in case 2, the elite part of the ACS ant colony is given to the DCACS ant colony; in case 3, the elite part of the DCACS ant colony is assigned to the MMAS ant colony; in case 4, the elite part of the ACS ant colony is given to MMAS ant colony; in case 5, the elite part of the DCACS ant colony is given to the ACS ant colony; and in case 6, the elite part of the MMAS ant colony is assigned to the DCACS ant colony. Among them, cases 1 and 4 indicate that the two subpopulations form a cooperative consensus to exchange elite information. Cases 2 and case 5, case 3 and case 6 are the same as cases 1 and 4.
The essence of information exchange is the reorganization of pheromones. Case 1 and case 4 are used as examples to illustrate the exchange process. Firstly, the paths of all ants in the two subpopulations were sorted in ascending order at the end of each iteration to facilitate the extraction of elite information in the next step. Then, the pheromone corresponding to the shorter path is selected for exchange. The specific process is shown in Fig. 3.

Normalized comprehensive evaluation operator
Each subpopulation has three strategies to choose when the algorithm falls into local optimum. These three strategies include the pheromone matrix of the subpopulation itself and the pheromone matrix of the exchange with the other two subpopulation elites. Only when its performance can be improved can each subpopulation exchange elite information to select the most suitable pheromone matrix. The performance of the algorithm is evaluated from the diversity, convergence, and the quality of the solution. The diversity and convergence of the algorithm, and quality of the solution are calculated according to .
where D is the ratio of the current information entropy to the maximum information entropy, which represents the diversity. The larger the value of D, the better the diversity. C denotes convergence of the algorithm. Nc is the maximum number of iterations, and N is the number of iterations corresponding to the first obtaining the optimal solution; There was an inverse ratio correlation between the N and C. The smaller the value of N, the better the convergence. V denotes the quality of the solution, which is the ratio of the theoretical optimal solution L best to the practical optimal solution L. The larger the value of V is, the closer the optimal solution is to the theoretical optimal solution. Therefore, the normalized comprehensive evaluation operator is put forward, and the three indexes are normalized so that their values are between 0 and 1. The optimal pheromone matrix is selected by the subpopulation according to the normalized comprehensive evaluation operator. The normalized comprehensive evaluation operator is calculated according to Eq. (22).
Each population selects the most suitable pheromone matrix according to Eq. (22). The larger the value of P, the better the performance of the algorithm.

Algorithm framework
In the proposed AGACO, the heterogeneous ant colony is composed of ACS, MMAS, and DCACS. Firstly, the parameters of each subpopulation are initialized before each colony optimizes according to its own rules. With the increase in iteration times, the path pheromone accumulation gradually leads to the decrease in algorithm diversity. The adaptive interactive learning mechanism is adopted to balance the convergence and diversity of the algorithm. At the same time, weight is assigned according to the fitness value of the learning object to avoid the algorithm convergence too fast. An elite information exchange strategy based on non-zero-sum game is triggered when the algorithm gets stuck in the optimization process. In this strategy, each subpopulation chooses the cooperative objects to exchange elite information according to the normalized comprehensive evaluation operator to further improve the precision of the solution. The flowchart of the algorithm is shown in Fig. 4.

Complexity analysis
According to the pseudocode in Sect. 3.4, it can be seen that the search process of heterogeneous ant colony is parallel, so the maximum time complexity of AGACO is However,m 1 , m 2 , and m 3 denote the number of subpopulation ACS, MMAS, and DCACS respectively, r is the number of cities, and Nc is the maximum number of iterations. After simplification, the maximum time complexity of AGACO is O m Â r Â Nc ð Þ , which is the same as the complexity of original algorithms ACS, MMAS, and DCACS. Therefore, the improved AGCAO algorithm in this paper does not increase the complexity, and the performance of the algorithm is significantly improved compared with the singlepopulation algorithm. The parameters of these three subpopulations were determined by an orthogonal experiment. The values of parameters a,b,q,n,q 0 of subpopulation ACS and DCACS were determined by orthogonal experiment with five factors and four levels. The values of parameters a,b,q,q 0 of the subpopulation MMAS were determined by orthogonal experiment with three factors and four levels. The test set Eil51 was used as the experimental object for 20 experiments. The optimum parameters are determined according to the experimental results. The level setting of each factor is shown in Tables 1 and 5. The experimental process of subpopulation ACS and DCACS is shown in Tables 2, 3 and 4. The experimental process of the subpopulation MMAS is shown in Tables 6 and 7. The final parameter settings are shown in Table 8.

Effectiveness test of each mechanism
Adaptive interactive learning mechanism and non-zerosum game are used for communication between    Heterogeneous ant colony optimization based on adaptive interactive learning and non-zero-sum… 3911 heterogeneous ant colonies. A controlled trial was conducted to verify the effectiveness of the strategy. The heterogeneous ant colony optimization algorithm with only adaptive interactive learning mechanism is defined as AGACO-1. The heterogeneous ant colony optimization algorithm with only non-zero-sum game strategy is defined as AGACO-2. Eil51, ch150, and fl417 cities of different sizes were selected as representatives of different city sets, and 30 groups of experiments are carried out for each city set. The effectiveness of each strategy is verified by comparing the optimal solution, average solution, error rate, and convergence of the algorithm. The experimental data are shown in Table 9. The error rates of the different algorithms are shown in Fig. 5. To more intuitively compare the convergence of the algorithm, the comparison diagram of the convergence speed is shown in Fig. 6. As can be seen from Table 9 and Fig. 6, the AGACO-1 algorithm converges earlier than the AGACO-2 algorithm for these city sets. The reason is that the AGACO-1 algorithm adopts an adaptive complementary learning mechanism. Through interaction between heterogeneous populations, on the one hand, the dominant individuals drive the disadvantaged individuals to balance the pheromone distribution and expand the search scope. On the other hand, the dominant individual also has a certain degree of enhancement, which speeds up the convergence of the algorithm. According to the optimal solution and average solution in Table 9, the solution quality of the AGACO-2 algorithm has been greatly improved compared with that of the AGACO-1 algorithm. It is shown that the non-zero-sum game based on the elite exchange strategy in the AGACO-2 algorithm can jump out of the current optimal path and further improve the quality of the solution. A comparison of error rates of the three algorithms is shown in Fig. 5. The error rates of the AGACO and   Fig. 5 Comparison of the error rates of different algorithms AGACO-2 algorithms are both zero in small-and mediumsized cities. In large-scale cities, the error rates of the AGACO-1 and AGACO-2 algorithms are significantly higher than the AGACO algorithm, and the error of the AGACO algorithm is reduced to less than 1%. On the whole, the AGACO algorithm combining the two strategies has the best performance.
In the real-case scenarios, the running time of the method on the same instances is more relevant. The convergence in the real elapsing time is shown in Fig. 6, where the horizontal and vertical coordinates represent the running time and error, respectively. From the horizontal coordinate of Fig. 6, the running time of AGACO-1 is the shortest, while the running time of AGACO is the longest. And the running time gradually becomes longer as the size of the city increases. The longer time of the algorithm indicates that the more nodes are searched and the better paths may be found. For example, urban ch150, although the total running time of the improved algorithm became longer, the fastest convergence was achieved by AGACO. From the vertical coordinate of Fig. 6, the error of the AGACO algorithm is the smallest. According to Table 9 and Fig. 6, for the same city, the running time between AGACO and AGACO-1 and AGACO-2 is close, while AGACO can find more optimal path.

Comparison of AGACO with ACS, MMAS, and DCACS
Sixteen TSP cities of different sizes were tested to verify the effectiveness of the proposed AGACO algorithm. Thirty experiments were conducted in each city, and the optimal solution error rate, average solution, and convergence times of the test set were analyzed, respectively. The calculation results of the error rate are shown in Eq. (23). AGACO algorithm is compared with classical algorithm ACS, MMAS, and improved DCACS algorithm. The experimental data are shown in Table 10.
L best denotes the actual optimal solution obtained by the algorithm, and L min denotes the theoretical optimal solution. According to Table 10, the classical algorithm ACS, MMAS, the single-population DCACS algorithm, and the heterogeneous ant colony AGACO algorithm can obtain the theoretical optimal solution for small-scale cities such as Eil51 and kroA100. However, the improved heterogeneous ant colony algorithm (AGACO) can find the optimal solution faster in terms of convergence. For medium-scale fl417 (a) eil51 Fig. 6 Convergence comparison of AGACO, AGACO-1, and AGACO-2 Heterogeneous ant colony optimization based on adaptive interactive learning and non-zero-sum… 3913 cities such as ch150, pr264, and a280, the heterogeneous ant colony algorithm AGACO can find better solutions faster than the single-population algorithm. The convergence comparison of small-and medium-sized cities is shown in Fig. 7. In terms of error rate, the error rate of the AGACO algorithm is less than 0.1%, or even zero, which is significantly better than other algorithms. This can be attributed to the interactive learning between heterogeneous ant colonies, which improves the performance of the algorithm through information sharing.   Table 10 provides the convergence times (in seconds) for each algorithm. In terms of convergence time, for cities Eil51,Eil76,KroA100,ch130,KroA150,pr264,a280,att532, the AGACO has the shortest convergence time. For the other cities, the convergence time of AGACO is close to that of ACS and MMAS and DCACS.
As can be seen from Table 10, it is more difficult for a single ant colony to search for optimization and it is easy for large-scale cities to get stuck in a stagnant state in the later stage of the algorithm. The improved AGACO algorithm has outstanding advantages. The quality of the optimal solution obtained by the AGACO algorithm in the city concentration of lin318, att532, and pr654 is much higher than that of the single-population algorithm. The error rate of the AGACO algorithm is also greatly reduced to about 1%. This is attributed to the elite exchange strategy based on non-zero-sum game in the AGACO algorithm. This strategy helps the algorithm to jump out of the stagnation state and find a better solution. On the whole, the average solution of the AGACO algorithm is also better than other algorithms, which indicates that the heterogeneous ant colony optimization algorithm is more stable and has better robustness. For a more intuitive comparison of the convergence of the four algorithms, a comparison of the convergence of large-scale cities is shown in Fig. 8. The optimal path map for some cities is shown in Fig. 9.

Comparison of AGACO with state-of-the-art algorithms
To further evaluate the performance of AGCAO, some improved algorithms are used for comparison. The comparative data are shown in Tables 11 and 12. The average solution and average error rate of the algorithm were compared with other intelligent optimization algorithms, as shown in Table 11. For all small-scale cities, the average solution and average error rate of the improved AGACO algorithm are lower than those of optimization algorithms. These algorithms include ACO (Gündüz et al. 2015), ABC (Gündüz et al. 2015), DWCA Fig. 9 The shortest path of some TSP instances found by AGACO  (Mahi et al., 2015) and Co-ASPSO-Ls (Rokbani et al., 2021) for Eil51 and Eil76, the AGACO error rate is smallest when the number of cities exceeds 100.
The global optimal solution and error rate are analyzed, as shown in Table 12. According to Table 12, AGACO is superior to other algorithms such as DSMO, LDTACO , JCACO (Zhang et al. 2019), and HRPACO (Pan et al. 2020) in solving large-scale TSP. The theoretical optimal path is searched when the proposed AGACO algorithm is applied to mid-scale TSP, such as kroA200 and pr264. The error rate of AGACO is also lower than that of other improved algorithms in solving large-scale TSP. Table 13 provides a comparison of the running time of AGACO with the other algorithms. According to Table 13, ABC has the shortest running time and AGACO is the second shortest. However, according to the results in Table 11, ABC has a higher error rate than AGACO and obtains the worst quality of the solution. AGACO, on the other hand, provides a better quality of the solution. In general, the AGACO algorithm improves the accuracy of the solution greatly, especially for large-scale TSP problems.

Practical application of the algorithm
The improved algorithm is applied to robot path planning to verify the feasibility of the algorithm. Path planning is to find a shortest path from the beginning to the end in the The'-' indicates that there is no corresponding data in the algorithm  The paths planned by the two algorithms specified area. The research of path planning mainly includes two aspects, one is to transform the real-world map which cannot be recognized by machines into a recognizable environment model, and the other is to design an optimization algorithm. In this section, the grid method is used to establish the environment model. Grid method divides the environmental information of mobile robot into square grids of equal size. The obstacle grid is assigned a value of 1 and the free space grid is assigned a value of 0 in the grid diagram. Figure 10 a is a map in a real environment, and Fig. 10b is a two-dimensional map obtained by using ROS algorithm under Ubuntu system. Firstly, the environment model is built and the map is transformed into a raster map of 42 9 59 size as shown in Fig. 11. The black areas in Fig. 11 are obstacles and the white areas are feasible areas. The upper left is the starting point whose coordinates are [2,36] and the lower right is the end point whose coordinates are [54,3]. Secondly, the ACS and the AGACO are used for path planning on the raster map, respectively. The planned paths are shown in Fig. 11, in which the red and blue lines represent the paths planned by the conventional ACS algorithm and the improved AGACO algorithm, respectively. According to Fig. 11, the path lengths planned by ACS and AGACO are 67.84 and 66.08, respectively. Although both the ACS and AGACO can effectively avoid obstacles when performing path planning, the AGACO plans a better path. The AGACO algorithm can effectively avoid nodes with obstacles and reduce path turns when selecting the next target point.
The experimental results prove that the improved algorithm has an optimization effect on the path. The AGCAO makes the path length shorter and improves the efficiency of the algorithm. This indicates that the improved algorithm is feasible for robot path planning.

Conclusions
The ACO algorithm has problems such as poor solution quality and slow convergence speed when solving the wellknown optimization problem: the traveling salesman problem. The drawbacks of the ACO algorithm encouraged this study to propose a heterogeneous ant colony optimization algorithm based on adaptive interactive learning and non-zero-sum game. The heterogeneous ant colonies composed of ACS subpopulation, MMAS subpopulation, and improved DACS subpopulation retained the advantages of each subpopulation. The goal of heterogeneous ant colony is to enhance the performance by communication among the populations. In this regard, an adaptive interactive learning mechanism is proposed when the algorithm diversity is below the threshold. The mechanism aims to increase diversity and accelerate convergence. Aiming at the problem that ACO is easy to fall into local optimum, an elite information exchange strategy based on non-zero-sum game is proposed. The goal of this strategy is to enhance the accuracy of solution through elite information exchange. Experiments were carried out to evaluate the performance of the proposed strategy by using different scale TSP examples. Experimental results show that the quality and convergence of the heterogeneous ant colony algorithm are improved greatly in solving TSP problems. In future work, the multi-population colony algorithm based on game theory will be further studied. Data availability Enquiries about data availability should be directed to the authors.

Declarations
Conflict of interest The authors declare that there are no other people or organizations that could appear to have influenced the submitted work.