An Improved Genetic Algorithm Based Annulus-Sector Clustering Routing Protocol for Wireless Sensor Networks

In the clustering routing protocols for wireless sensor networks, uniform cluster formation and optimal routing paths finding are the two most important factors to minimize the network energy consumption and balance the network load. In this paper, an improved genetic algorithm based annulus-sector clustering routing protocol called GACRP is proposed. In GACRP, the circular network is divided into sectors with the same size for each annulus. The number of sectors is obtained by calculating the minimum energy consumption of the network. Each annulus-sector forms a cluster and the best node in this annulus-sector is selected as cluster head. Moreover, an improved genetic algorithm with a novel fitness function considering energy and load balance is presented to find the optimal routing path for each CH, and an adaptive round time is calculated to maintain the clusters. Simulation results show that GACRP can significantly improve the network energy efficiency and prolong the network lifetime as well as mitigate the hot spot problem.


Introduction
With the advent of the era of artificial intelligence, wireless sensor networks (WSNs) are being widely used for information collection from various applications such as health and environment monitoring, battlefield surveillance and space exploration [1]. Minimizing the network energy consumption to increase the network lifetime is always the eternal theme because the nodes in these networks are resource-constrained and often randomly deployed in harsh environments. Clustering routing has been proved to reduce energy consumption, improve network scalability and extend the network lifetime [2,3], and many approaches have been proposed during the last decades, which 1 3 usually consists of two phases: cluster construction and data delivery. In the phase of cluster construction, cluster heads (CHs) are selected by random probability based [4,5], weight based [6,7], intelligent computing based [8,9] methods. And single-hop [4], multi-hop [5,[7][8][9] or hybrid [6] schemes are used for data transmission during the data delivery phase. There is no doubt that the network energy consumption can be decreased and the network lifetime is prolonged to a certain extend for all the existing approaches. However, neglecting the clusters' size results in uneven clusters, and the random distribution of clusters is most likely to cause unbalanced energy consumption. What's more, long-distance transmission exiting in the routing paths exponentially increases the energy consumption.
Clustering routing approaches that divide the sensing area into annular sectors have been presented to solve the problem mentioned above [10,11]. Usually, each annulussector has the same size, in which the nodes form a cluster so as to balance the energy consumption because their consumed energy is roughly the same for their almost same distance to the BS. And the node with the best performance in each annulus-sector is selected as CH to decrease the amount of interaction based on probability [12], weight according to residual energy [13], residual energy and distance to the BS [11], residual energy and distance to the centre of the cluster [10], received signal strength [14]. Moreover, a data delivery scheme through multi-hop is adopted to make CHs in an outer annulus forward data to one CH lying in its next annulus till to the BS, resulting in high energy efficiency and low network energy consumption. In this way, the nodes near the BS deplete their energy much quicker than the others due to the data traffic gathered in the BS, leading to unbalanced load and uneven energy consumption across the network, which is typically called as the hot spot problem [15]. Various unequal clustering approaches have been presented to deal with the hot spot problem in the traditional hierarchy networks [16]. For circular networks, different size of annulus-sector division or specific data forwarding scheme is utilized for load balancing among the CHs so as to alleviate the hot spot problem [10][11][12]. However, the number of sectors is randomly divided makes it impossible for uniform energy consumption in each annulus. Furthermore, hop-by-hop data delivery burdens the intermediate nodes which are prone to premature death. Especially, fixed round time based maintaining the clusters produces too much overhead due to the frequent CHs rotation.
In this paper, an improved Genetic Algorithm based annulus-sector Clustering Routing Protocol called GACRP is proposed to minimize the network energy consumption and balance the network load. The circular network is divided into different annuluses, and each annulus is separated into different sectors with the same size according to the calculated optimal number of clusters. Each sector forms a cluster, and there is only a CH selected in each cluster on the basis of its load, residual energy and distance to the BS. Moreover, an improved genetic algorithm is used to search the best routing paths for all the CHs so as to achieve balanced load and energy, and a novel adaptive round time is presented to maintain the clusters. Simulations are conducted to verify the performance of GACRP compared with several up-to-date existing relevant protocols. The main contributions are outlined as follows: • Based on minimum energy consumption of each annulus, the optimal cluster number in each annulus is obtained to form uniform clusters.
• An improved genetic algorithm whose novel fitness function considers the minimum network energy consumption and the balance of CHs' loads so as to obtain the optimal routing paths. Specifically, the routing paths are denoted by a chromosome with valid genes.
• An adaptive round time in line with balanced load and energy is used for CHs rotation to avoid frequent clustering so as to further save the network energy consumption and improve the network throughput.
The rest of the paper is organized as follows. Detailed description of the related works is given in Sect. 2. In Sect. 3, the network model is discussed, and Sect. 4 focuses on the proposed GACRP. In Sect. 5, simulations are conducted to verify the effectiveness of GACRP. Finally, conclusions are presented in Sect. 6.

Related Works
Clustering routing approaches have been proved to efficiently extend the network lifetime with many advantages such as good scalability, high energy efficiency, low end-to-end delay. Thus, a diversified number of clustering routing schemes have been proposed during the last decades. Low-Energy Adaptive Clustering Hierarchy (LEACH) [4] is the pioneering clustering routing protocol, in which CHs are selected randomly and every node can be selected as CH at least once in a certain round. All the cluster members (CMs) send their sensed data in the allotted timeslots to their CHs, respectively. Each CH fuses the collected data and send it to the BS directly. Although LEACH is a distributed protocol with some advantages such as simplicity, balanced load, low overhead and configurable number of CHs, it also has some disadvantages. Firstly, its CHs communicate with the BS in singlehop makes the farther ones deplete energy faster, resulting in weak scalability. Secondly, its CHs are selected randomly without considering the residual energy, so the CHs distribute unevenly, and even worse some nodes with low energy are also selected as CHs, resulting in unbalanced energy consumption and nodes' premature death. In order to solve the drawbacks of LEACH, various and diverse improvements have been presented [17,18]. In [18], a survey on successors of LEACH is given according to single-hop and multi-hop communication. Moreover, the advantages and the disadvantages of each variant of LEACH are described. However, traditional approaches cannot adapt to network uncertainties and dynamics, especially to achieve the global optimal solution. Therefore, intelligent computing based methods such as bat algorithm [19], fuzzy logic control [20], moth flame optimization [21], imperialist competitive algorithm [22], and genetic algorithm [23] are proposed to settle these problems. In [19], the bat algorithm is utilized to select CHs, which is responsible for optimizing the objective function of a cluster whose value is decided by the parameters of the average energy and the distance variance within the cluster. In [20], two fuzzy logic controllers are used to select the CHs and determine the best forwarders based on residual energy, weight, the space to the BS, distance and trust factor, respectively. The outputs of the fuzzy logic controllers are the probability of being selected as CH, the radius of the cluster, and the probability of being selected as the optimal forwarder. In [21], the moth flame optimization algorithm is presented to select the deserving trustworthy CHs, whose fitness function considers five parameters such as residual energy of elected node, connected node density, average delay of transmission and so on. In [22], the imperialist competitive algorithm is introduced to select optimal CHs so as to solve the issues of load balance and energy consumption. The fitness function of the imperialist competitive algorithm considers the standard deviation of the CH's load to minimize the loads of CHs, resulting in maximum network lifetime. In [23], a genetic algorithm is used to selects CHs, whose fitness function considers the factors the energy of all nodes, the distance between a CH and its CMs, the number of nodes in the cluster, and so on. The simulation results have verified that the methods as mentioned earlier can form optimal clusters, reduce energy consumption, and enhance the network lifetime. However, the CHs in the vicinity of the BS inevitably consume more energy to take more data relay tasks than the father ones, resulting in hot spot problem. Then unequal clustering routing algorithms are adopted to resolve this problem by assigning the smaller cluster size for CHs nearer to the BS [24][25][26]. In [24], EUCOR determines the CHs by using a novel objective function considering the nodes' residual energy, energy consumed to reach the BS, and the competition radius of the nodes. Moreover, the competition radius of a node is computed using its distance to the BS, the higher and lower distance of the nodes from BS, resulting in forming clusters with variable cluster sizes. So the lifespan of the nodes is prolonged by minimizing the use of energy in the forwarding of packets. In [25], EADUC is proposed to select CHs based on the ratio of the node's residual energy and its neighbour nodes' average residual energy. Especially, node degree, residual energy and distance to BS are used to obtain the competition radius, and residual energy is used as a relay metric to select the relay nodes while the same clustering is performed for several rounds to eliminate the re-clustering overhead and further save the energy consumption. In [26], UCF uses a fuzzy logic system with descriptors distance to BS and local density to determine the cluster radius which is larger for CHs with longer distance to BS and less local neighbours. Thus, the clusters nearer to the BS have smaller cluster radius than the longer ones. However, random CHs selection only based on residual energy needs large amount of control message and increases collisions. EADUC and UCF perform well to extend the network lifetime. However, clustering with different sizes leads to unbalanced intra-cluster energy consumption, and inter-cluster communication still in hop-by-hop mode increases the routing energy consumption. Accordingly, a genetic algorithm improved multi-hop clustering routing protocol named OMPFM is proposed in [27] to search the optimal paths from the source CHs to the BS. OMPFM defines a new fitness function which considers the following four parameters: the average distance from the source CH to the relay CHs reaching the BS, the number of CHs through the path, the total number of CHs in the path, and the total number of CMs of the corresponding CHs in the path. The simulation results show that OMPFM is better than LEACH in network lifetime and power consumption by approximately 50%. However, invalid individuals may be generated in OMPFM because of its adopted traditional selection, crossover and mutation operations, resulting in a local optimum. Especially, forming clusters still by message broadcasting increases the network energy consumption.
Therefore, annulus-sector based clustering routing approaches have been proposed to solve the problems mentioned above, which divide the network into annular sectors, and the nodes in each annulus-sector form a cluster without message interaction in traditional clustering approaches, so the energy consumption during the process of clustering is reduced significantly [28][29][30]. In [28], ADEC divides the network into annuluses whose width is determined by the distance to the BS, and the width of the last annulus is the largest. Once the annuluses are fixed, the nodes with more residual energy and denser node degree are selected as CHs. Moreover, each CH forwards the data collected from its CMs to the closest CH in inner annulus till to the BS in the end. Without a doubt, the energy consumption can be largely saved by clustering in annuluses independently and forwarding data annulus-by-annulus. However, the various optimal width of the annuluses is hardly to be decided, and the message communication during the process of forming clusters increases the network energy consumption. So in [29], a clustering routing method for a circular network has been proposed, and the optimal width between the adjacent rings is fully investigated. In addition, each annulus-sector is a cluster without message broadcasting to form clusters. Similar to ADEC, the CHs send their data to the BS in multi-hop mode by using its upper CHs located in the interior annulus so as to reduce energy consumption. In [30], OCCN is proposed to divide the network into concentric rings with the same width as in [29]. The optimal number of clusters k in the network is calculated to reduce the network energy consumption, and the average cluster size can be described by N/k, where N denotes the total number of nodes. Moreover, each node in a cluster reserves a special set of timeslots for being selected CH, so as to avoid the energy and time-consuming procedure of re-clustering. The upper CH toward the BS is found to relay data based on the distance to the BS. However, neglecting the residual energy, location and other parameters, the CH rotated only by the allocated timeslots in the cluster will result in uneven energy consumption and premature death of node in the cluster. Besides, the hot spot problem doesn't be considered. In [13], TSTCS divides the network into n rings and six sectors, the angle of each sector is π∕6 . At the same time, each sector is separated into cells of the same size. Moreover, the number of cells in a sector increases by 1 with the ringID from the inner ring to the outer ring. Each cell represents a cluster, a node located in a circular region with diameter R at the middle of the cell, whose residual energy is the highest, is selected as the CH. The selected CH collects data from its members, aggregates and sends it to the BS either directly (for clusters in the closest ring to BS) or through other CHs in inner rings. Moreover, when the residual energy of the CH reaches the preset threshold value, a substitution node will be selected as CH from its own R based on the energy level to provide local remedy for energy suffering. For data forwarding, the CH chooses a CH from its lower ring with a higher lifetime defined according to its residual energy for data gathering and forwarding and the average residual energy of R nodes. However, the deterministic sector partition (fix angle of sector equals ∕6 ) and number of cells (ranged from 1,2,… with the increase of ring ID) cannot guarantee the minimum energy consumption of the network. Moreover, the value of R is not given clearly in this paper, which is likely to increase the energy consumption of each cluster. To solve this problem, AEBDC is proposed in [10] to improve the performance of TSTCS. At first, AEBDC divides the network into several annular sectors of various sizes, also the nodes located in the same annulussector consist of a cluster. Then, the region for candidate CHs (RCCH) lying at the intersection of the symmetry axis and the middle line of the annulus-sector is set to effectively balance the clusters' energy consumption and extend the clusters' lifetime. The radii of the RCCH is also formulated in [10]. The nodes are located in RCCH are regarded as "candidate cluster headers (CCHs)", otherwise, "common nodes (CNs)". Furthermore, CCH with higher weight by considering residual energy and distance to the center of the RCCH. Especially, the CH in k annulus forwards data to the CCH with the highest residual energy in the k-1 annulus, and the CCH in the k-1 annulus forwards the data to the CCH with the highest residual energy in the k-2 annulus, until the BS in the end. AEBDC can effectively not only balance the network energy consumption but also eliminate the hot spot problem. However, the farther ring from the BS and the farther distance from the RCCH for the CNs mean the more energy consumption and the easier premature death. What is more, the nearer ring to the BS for clusters produces more links for data forwarding, which undoubtedly leads to more collisions and thus increases energy consumption. All above-mentioned schemes forward data in multi-hop mode with hop-by-hop undoubtedly increase the endto-end delay as well as the network energy consumption, and the fixed round time based re-clustering produces too much overhead due to periodic formation of clusters. The main objective of this paper is to minimize the network energy consumption and balance the network loads by forming the best clusters and finding an optimal multi-hop routing path from every source CH to the BS.

3 3 Network Model
In GACRP, the network is regarded as a circular region with radius R similar to [10,13], divided into n annulus with the same width, and nodes are spatially scattered across the sensing field, the BS is situated in the centre of the region. Moreover, the network is assumed to have the following attributes: • N nodes are deployed in the network and each one has a unique ID. The set of nodes in the network is represented as S = {S 1 , S 2 ... S n }.
• The nodes are static, and their locations can be obtained by a positioning system or algorithm.
• Each node has the same initial energy for being homogeneous except for the BS. The same first order radio energy model as in [4,10,13,27] is used in this paper. The energy consumption for l − bits data transmission between two adjacent nodes with distance d can be described as follows: where E elec is the energy consumed by transmitting or receiving 1-bit data, fs and mp are the amplifier coefficients of free space and multi-path fading, respectively, d 0 is the threshold distance given by d 0 = √ fs ∕ mp . The energy consumed by receiving l − bits data can be expressed by: The energy consumption of l − bits data aggregation is: where E pDb is the energy consumed by fusing 1-bit data.

The Proposed Protocol
GACRP uses a genetic algorithm with improved selection, crossover and mutation operations to find the optimal routing paths based on the selected CHs, which consists of three phases: clusters formation, routing paths finding, clusters maintenance. Next, they are successively introduced in detail.

Clusters Formation
In order to minimize energy consumption, it is necessary to determine the optimal number of clusters and find the appropriate CH of each cluster. (1)

The optimal number of clusters
In a circular network, the energy consumption of the CH E ch in the last annulus differs from that in the other annuluses without data forwarding, which can be expressed as: where l is the length of data, N n . is the number of clusters, n is the number of annuluses, d ch is the distance to the next-hop CH which is depicted in Fig. 1. A(x n , y n ), B(x n−1 , y n−1 ) and C are CHs in the annuluses n and n-1, then d ch < d 0 . Moreover, there is d ch < r c (radius of the cluster) for proper data transmission. From Fig. 1, we can see that the minimum d ch is R n when the line of BC is perpendicular to the tangent z. So there is At the same time, the energy consumption of the members E cm can be expressed as: where d cc is the distance between member nodes and CH, which can be denoted by the expectation of its square: where d c is the maximum d cc , which is depicted in Fig. 2. Then d c can be found according to cosine theorem: is the corresponding centre angle, m n is the optimal number of clusters. Therefore, the total energy consumption of the annulus can be expressed as Taking the derivative of Eq. (9) concerning to m n , the optimal number of clusters can be attained: where Similarly, the optimal number of clusters in the other annuluses can be obtained. Without loss of generality, for annulus i, the total energy consumption is: The second part of Eq. (11) represents the energy consumed to receive and relay the data from the CHs in annulus i + 1 to n . Accordingly, the optimal number of clusters in annulus i is:

Fig. 2 Distance between a CM and its CH
where ρ is the node density. Afterwards, each annulus is equally divided into sectors according to its optimal number of clusters. Each sector forms a cluster, and the best node in this cluster will be selected as CH.

Selecting the Cluster Heads
Similar to [5,6], a node in each cluster becomes CH when its weight is the maximum which is given as follows.
where E residual i denotes the residual energy of node i , and E initial represents the nodes' initial energy. N i denotes the set of neighbours for node i , and d iBS denotes the distance from node i to the BS. It can be seen from Eq. (13) that the nodes with more residual energy, more uniform load and closer to the centre of neighbours are more likely to be selected as CHs. Once the CHs are selected, their IDs and residual energy are sent to the BS for global routing paths finding, and a TDMA scheme like in [4,10,13] is used for intracluster communication.

Routing Paths Finding
An improved genetic algorithm is adopted to search the optimal routing paths for the CHs, so as to avoid the drawbacks of the traditional genetic algorithm such as easily leading to premature convergence and falling into local optimum [27]. Moreover, invalid individuals may be generated in the traditional genetic algorithm due to its random operations of selection, crossover and mutation. So in GACRP, a constraint condition is provided for producing appropriate genes to avoid invalid individuals and enhance the convergence rate. The concrete realization of finding routing paths is elaborated as follows.

Constructing the Fitness Function
The fitness function is used to assess individuals' quality, representing the possible solutions for routing paths. To maximize the network lifetime, we need to decrease the CHs' energy consumption as much as possible. So the CHs' residual energy is considered as a factor for the fitness function, which is expressed as: where Eresidual h ij denotes the residual energy of the j th CH of the i th annulus. Moreover, E CHres is normalized as: where E CHres_min , E CHres_max denote the minimum and maximum of E CHres , respectively. In addition, the balance of loads L CHs for CHs also has a great influence on energy efficiency, which is used as the other factor for the fitness function. L CHs denotes the load balance of the CHs, which is given by: where h i means the i th CH, n ch denotes the number of CHs,E h i represents the energy consumption of h i , L h i denotes the load of CH h i . Therefore, the fitness function in GACRP can be expressed as: From Eq. (17), we can know the larger fitness function value, the better the individual's quality, and then the more likely the individual is passed on to the next generation.

Initializing the Population
In GACRP, real number encoding is used to represent the chromosomes of the population. A chromosome means an individual consisting of genes denoted by the IDs of CHs, and the ID of the BS equals n ch + 1. A specific gene of the chromosome indicates the next-hop CH of the corresponding CH, with an example illustrated in Fig. 3.
As shown in Fig. 3, suppose there are10 CHs selected from the network with 100 nodes, and their IDs are 5, 20, 33, 45, 49, 52, 73, 76, 81, 97, respectively. From the chromosome, it can be seen that the routing paths for CH 97 is 9757633101 (the BS). The genes are randomly produced like in a traditional genetic algorithm. Still, a constraint condition g i ∈ CH h i (CH h i is the candidate next-hop CHs for h i , which are located in the range of communication of h i . And i is the number of genes, i ∈ [1,10] in Fig. 3) is attached to gene g i in order to avoid invalid individuals such as in Fig. 4.
In the same way, the other valid chromosomes are produced to obtain the initial population.

Producing the Next Generation Population
The fitness function value of each chromosome in the initial population is calculated, which is arranged in descending order. The higher the fitness function value, the closer the individual is to the optimal solution. The elitist selection is used for selection operation, which selects the optimal individuals directly passed on to the next generation population. For the other chromosomes, each one determines whether its fitness function value is less than that of a randomly generated valid individual. If less than, it is kept for crossover operation. Or else, the randomly generated one is kept to accelerate convergence and ensure the population diversity. One-point crossover is used to produce new offspring based on the selected individuals. Because the parents are valid chromosomes, so their two children must be still valid. Then the fitness function value of each child is computed to compare with its parent. If its value is less than that of its parent, it is kept for mutation operation. Otherwise, a randomly generated individual is used to determine whether its fitness function value is less than that of the parent. If it is, the randomly generated individual is kept for mutation operation. Otherwise, the father is selected. In this way, the convergence rate is further improved.
Bit mutation is used for mutation operation, in which a random mutation point is selected to change the corresponding gene so as to produce a new individual. Of course, the new individual must be valid whose mutated gene is satisfied with the constraint condition. Similarly, its fitness function value is calculated to determine whether it is better than its parent, and the one with larger value is kept for the next generation. An example of the mutation operation is shown in Fig. 5.
Combined these new individuals with the elitist ones will produce the next generation population.

Finding the Optimal Routing Paths
Once one of the following termination conditions is satisfied, GACRP finds the optimal routing paths. One is the preset iteration number, and the other is the deviation degree of the fitness function values, which is expressed as: where Fitness i denotes the fitness function value of individual i, and Fitness max denotes the maximum fitness function value, ε is a small positive number which equals 10 −5 in this paper like [31,32]. The individual whose fitness function value is the largest is selected from the population, which gives the optimal routing path for each CH. The flow diagram of finding the optimal paths is depicted in Fig. 6.

Clusters Maintenance
Generally, re-clustering is used for CHs rotation based on a fixed round time [4,10,13] to reduce energy consumption. However, frequent clustering may cause more energy consumption for fixed round based approaches. Therefore, in GACRP, a new adaptive round time is presented to save energy consumption by avoiding the frequent CHs rotation. Moreover, the network's throughput is also improved. For the new round time, the load balancing and the energy balancing of the network are considered, represented as α and β.
where • T round denotes the traditional round time.
• α is the factor representing the load balancing, which is given by:  Fig. 6 The flow diagram of finding the optimal routing paths n ′ ch indicates the snumber of alive CHs. Apparently, no extra overhead is produced for the residual energy, and the state of the nodes is usually attached in the data packets forwarded to the BS. At the start of each round, the improved genetic algorithm is triggered by the BS to obtain the CHs' optimal routing paths, and then the BS broadcasts them to the network along with the calculated round time. Each node communicates with the others according to the received message.

Simulation and Results
In order to verify the performance of GACRP, simulations in different Scenarios are conducted through MATLAB in this section, and comparisons with the up-to-date correlative algorithms LEACH [4], AEBDC [10], TSTCS [13] and OMPFM [27] are also presented. In the network for simulations, N nodes are scattered randomly in the circular networks with radii 100 m and 200 m, and the BS is situated at the centre. The specific parameter settings are given in Table 1. The total energy consumption of all nodes is firstly tested to show the energy efficiency performance of GACRP, and the results are illustrated in Fig. 7.
As shown in Fig. 7, LEACH is the first to consume the energy of all nodes because of its single-hop communication, random CHs selection with negligence of residual energy. Thus its performance is the worst. TSTCS and AEBDC divide the network into rings, and the CHs in an outer ring forward data to the CHs in its adjacent inner ring, follow this till to the BS in the end, so their energy consumption is lower than that of LEACH. OMPFM can find the optimal routing paths for CHs by using a genetic algorithm compared with TSTCS and AEBDC. Its energy consumption is less than that of TSTCS and AEBDC, on the whole. However, OMPFM selects CHs using different threshold functions in three stages, which directly affects the distribution of clusters and routing paths finding, resulting in energy consumption is faster than TSTCS and AEBDC sometimes. Unlike random clustering in LEACH, TSTCS, AEBDC and OMPFM, GACRP forms clusters according to the calculated optimal cluster number. Moreover, an improved genetic algorithm considering energy and load balance is adopted to search the optimal routing paths, so its energy consumption is the lowest in scenarios 1 and 2. As a result, the total energy consumption of GACRP is 24.53%, 21.84%, 1.68% and 2.69% lower than those of LEACH, TSTCS, AEBDC and OMPFM in scenario 2, while 43.96%, 29.31%, 12.72% and 5.94% in scenario 2.
And then, the standard deviation of CHs' residual energy in two scenarios is tested to verify the performance of energy balance for GACRP, and the results are shown in Fig. 8.
Seen from Fig. 8, GACRP has the best performance of energy balance, while LEACH has the worst. LEACH randomly selects CHs and transfers data to the BS directly, resulting in different energy consumption between CHs close to the BS and those far from the BS. For TSTCS, the CHs in the outer ring transmit data to the adjacent CHs in the inner ring, so the energy consumption of CHs in the inner ring is higher than that in the outer ring because of data forwarding. OMPFM and GACRP use a genetic algorithm to find the optimal routing paths for CHs, especially CHs in the outer ring transmit data to CMs in the inner ring so as to balance the energy consumption in AEBDC. Accordingly, the residual energy deviation of CHs in TSTCS is higher than that of OMPFM, AEBDC and GACRP. Moreover. In GACRP, the energy consumption and load balance of CHs are fully considered in fitness function construction., which reduces the standard deviation of CHs' (a) (b) Fig. 8 Comparison of the standard deviation of CHs' residual energy residual energy. Compared with LEACH, TSTCS, OMPFM and AEBDC, the standard deviation of residual energy of GACRP is decreased by 74.35%, 72.05%, 57.08%, 28.8% in scenario 1, and 76.99%, 72.87%, 66.06%, 23.72% in scenario 2, respectively. Next, the network lifetime is tested to verify the survivability of GACRP, and the results are shown in Fig. 9, Tables 2 and 3.
Obviously, LEACH has the worst performance, its FND (first node die) appears at round 603 in scenario 1, and its LND (last node die) is at round 1445. In scenario 2, FND and LND occur at rounds 400 and 1022, respectively. LND of OMPFM at round 1950, 1482 and TSTCS at round 1589, 1205 are larger than that of LEACH in both scenarios. However, their FND at round 241, 89 and 121, 201 are smaller than that of LEACH because ring by ring communication in TSTCS and clustering based on piecewise threshold function in OMPFM result in uneven energy consumption. Moreover, AEBDC uses a vice CH to share the forwarding task of the primary CH, which can balance the energy consumption, its FND and LND are at round 1079, 739 and 1734, 1436 in scenario 1 and 2, respectively, so (a) (b) Fig. 9 Comparison of the number of living nodes  The network throughput is expressed by the effectively received data packets of the BS, which is an important indicator to measure the network quality of service and a direct reflection of CHs' load balance, so it is tested to verify the QoS (Quality of Service) of GACRP and the results are depicted in Fig. 10.
Seen from Fig. 10a and b, the network throughput of LEACH is the least due to its shortest lifetime. In TSTCS, the CHs close to the BS consume too much energy on data forwarding, resulting in their premature death. Consequently, a large number of data in the outer rings cannot be transmitted to the BS. Compared with TSTCS, AEBDC shares the task of data forwarding by using the vice CHs in clusters, so its performance is better than that of TSTCS. In GACRP, the optimal routing paths are selected to transmit more data to the BS because of the best number of clusters. In Scenario 1, the network throughput of GACRP is 57.6%, 45.23%, 11.68% and 7.09% higher than LEACH, TSTCS, OMPFM and AEBDC. In Scenario 2, the network throughput of GACRP is increased by 60.49%, 47.72%, 14.69% and 9.62% compared with LEACH, TSTCS, OMPFM and AEBDC.
Average end-to-end delay denoted by the average time of CHs sending data to the BS is an important metric to evaluate the real-time performance and fast data forwarding capability. So average end-to-end delay is tested under different packet sending rates, and the results are illustrated in Fig. 11.
It can be seen from Fig. 11 that the average end-to-end delay of LEACH is minimal due to its CHs' direct communication with the BS. In AEBDC, the CHs need to find the optimal vice CHs before sending data, which leads to the highest average end-to-end delay. Unlike ring-by-ring data forwarding in TSTCS, the CHs use the optimal routing (a) (b) Fig. 10 Comparison of network throughput paths determined by genetic algorithm to transfer data in OMPFM and GACRP, so their average end-to-end delay is lower than that of TSTCS. Differing from the random clustering in TSTCS, uniform clusters are formed in GACRP by dividing the network into equal annulus-sector based on the optimal cluster number, making the average end-toend delay of GACRP lower than that of OMPFM. As a result, the average end-to-end delay of GACRP outperforms OMPFM by 12.74%, 12.96%, TSTCS by 16.79%, 26.54% and AEBDC by 37.37%, 43.6% in Scenario 1 and 2, respectively.

Conclusion
In this paper, an annulus-sector clustering routing protocol GACRP using an improved genetic algorithm is proposed to not only minimize network energy consumption but also balance network load. In GACRP, the optimal cluster number of each annulus is used for annulus-sector division, which minimizes the energy consumption of each annulus. And the best nodes in the annulus sectors are selected for data aggregation, transmission and forwarding, which balances the intra-cluster loads and energy consumption. Moreover, an improved genetic algorithm with a novel fitness function considering energy and load balance is utilized to find the optimal routing paths for CHs, which balances the inter-cluster loads and energy consumption. In addition, an adaptive round time is adopted to maintain the clusters, which further reduces the network energy consumption. Simulations are conducted to validate the performance of GACRP compared with several existing relevant protocols such as LEACH, AEBDC, TSTCS and OMPFM. The results indicate that GACRP can effectively alleviate the hot spot problem and outperform these protocols in terms of energy efficiency, average end-to-end delay, network throughput and network lifetime.

Conflict of interest
The authors declare that they have no conflict of interest.
(a) (b) Fig.11 Comparison of average end-to-end delay Wu Sha-sha is currently pursuing her master's degree at college of computer science and engineering, Changchun University of Technology, China. Her research interests include clustering routing algorithms of wireless sensor network