An Integrated Routing Mechanism on Homogenous Wireless Sensor Networks using Genetic Algorithm

Data collection on Wireless Sensor Networks (WSNs) is a significant challenge to satisfy the requirements of various applications. Providing an energy-efficient routing technique is the primary step in data collection over WSNs. The existing data collection techniques in the WSNs field struggle with the imbalance load distribution and the short lifetime of the network. This paper proposes a novel mechanism to select cluster-heads, cluster the wireless sensor nodes, and determine the optimal route from source nodes to the sink. We employ the genetic algorithm to solve the routing problem considering the hop-count of the cluster-heads to the sink, the number of each cluster member, residual energy of cluster-heads, and the number of cluster-heads connected to the sink as the fitness criteria. Our proposed mechanism uses a greedy approach to calculate the hop-count of each cluster-head to the sink for integrating the clustering and routing process on WSNs. The simulation results demonstrate that our proposed mechanism improves the energy consumption, the number of live nodes, and the lifetime of the network compared to other data collection approaches on WSNs.


Introduction
Over the last decades, wireless communications have been significantly expanded to offer services in modern environments [1][2][3]. Wireless Sensor Networks (WSNs) are among the hottest communication technologies that affect the various domains of human life [4,5]. The term of WSNs refers to the networked interconnection between a large number of wireless sensor nodes and one/several sinks [6,7]. The main objective of these networks is to collect data from the monitoring area and then process and send them to the sink for further analysis [8]. Through the recent advancements in multiple algorithms and technologies, wireless sensor nodes are equipped with sensing, processing, computing, and communicating capabilities [9]. Besides, due to features such as cognitive performance, low cost, and high fault tolerance, the vision of the WSNs is couched on many applications, including traffic control systems [10][11][12], biomedical health monitoring [13,14], automated assistance for the elderly [15,16], industrial control systems [17], virtual reality [18][19][20], and environmental monitoring systems [21,22].
Data collection is the primary challenge to address the applications' requirements, as mentioned above [23,24]. It is defined as a creative process that gathers data from different wireless sensor nodes, and then compresses/aggregates them using a specific function to reduce redundant outgoing packet transmissions [25]. Besides, wireless sensor nodes provide their required energy from low-power batteries without replacing or recharging possibilities. WSNs are usually employed in harsh and disaster environments for a long time [26,27]. In other words, limitation on the battery is another main challenge on wireless networks that influences their lifetime [28,29]. Therefore, it is essential to provide an energy-efficient data collection model on WSNs. Since the amount of energy consumption for wireless communications is much more than processing costs, the most critical challenge for WSNs is to provide an energy-efficient mechanism for routing collected data from source nodes to the sink [30,31].
So far, data routing on WSNs has been used classical, fuzzy-based, and meta-heuristic approaches [32]. Classic approaches focus on cluster-head selection. However, they do not combine essential parameters to select appropriate cluster-heads. Also, since classical methods ignore the clusters' size and their distribution over the monitoring area, the load balancing over the network is disrupted, and its lifetime does not improve significantly [33]. Fuzzy-based approaches aim to cluster-head selection and determine the competitive radius. Since most fuzzy-based methods are distributed, their computational complexity is unacceptable for energy-constraint wireless sensor nodes. Another limitation of fuzzy-based methods is the lack of attention to the valuable parameters for cluster-head selection and the lack of proper strategy for data routing [34]. Finally, some approaches have exploited meta-heuristic methods to solve the cluster-head selection, clustering, and data routing on WSNs. The meta-heuristic techniques do not consider the scalability, complexity, and data transmission delay in each round. These algorithms are more complex than previous ones but offer better solutions for data transmission from source nodes to the sink. Above all this, meta-heuristic methods allow the optimal use of several parameters that influence the network's performance in a parallel manner. On the other, since the routing problem on WSNs is NP-hard, meta-heuristic methods would be the best solution to address its challenges [35].
In this paper, an efficient data routing mechanism is proposed on WSNs called Integrated Routing based on Genetic Algorithm (IRGA). The main objective of IRGA is to select cluster-heads, cluster the wireless sensor nodes, and determine the optimal route from source nodes to the sink. We employ the genetic algorithm to solve the routing problem. Each chromosome of the algorithm is represented as a one-dimensional array of zeros (wireless sensor nodes) and ones (cluster-heads). Then, the genetic algorithm tries to find the near-optimal clusters and routes, considering the hopcount of the cluster-heads to the sink, the number of each cluster member, residual energy of cluster-heads, and the number of cluster-heads connected to the sink as the fitness criteria. In more detail, the problem is modeled as the hop-count of the cluster-heads to the sink, and the number of each cluster member is minimized. In contrast, cluster-heads' residual energy and the number of cluster-heads connected to the sink remain as high as possible.
It should be noted that IRGA uses a greedy approach to calculate the hop-count of each clusterhead to the sink. Each cluster-head collects data from its cluster and then sends them to another cluster-head, closer to the sink. This process continues until the data of all wireless sensor nodes are delivered to the sink. Indeed, our proposed mechanism integrates the clustering and routing process on WSNs. The simulation results illustrate that our proposed mechanism improves the energy consumption, the number of live nodes, and the lifetime of the network compared to recent routing approaches on WSNs.
The fundamental contributions of our mechanism are summarized as follows: • Providing an energy-efficient integrated model for clustering and routing on WSNs exploiting the meta-heuristic algorithm. • Considering the hop-count of the cluster-heads to the sink, the number of each cluster member, residual energy of cluster-heads, and the number of cluster-heads connected to the sink as the fitness criteria for integrated clustering and routing on WSNs. • Enhancing the lifetime of WSNs by reducing the energy consumption of wireless sensor nodes.
The rest of the paper is organized as follows: Section 2 provides a literature review. WSN model and energy consumption pattern are explained in Section 3. Section 4 discusses the proposed mechanism. The simulation results are described in Section 5. Finally, the paper is concluded in Section 6.

Related Work
Data collection is the primary issue to address the requirements of the applications on WSNs, including energy consumption, security, and reliability [8,36]. Tackling all of the challenges on wireless networks is difficult due to their incompatibility. Still, a broad range of approaches has been proposed to overcome the energy consumption challenge on WSNs. The main objective of energy-aware methods is to provide efficient routes from the source nodes to the sink to reduce the energy consumption of sensors and enhance the lifetime of the network [6]. So far, data routing on WSNs has been employed in classical, fuzzy-based, and meta-heuristic approaches [32].
The classical data routing methods focus on cluster-head selection. In this case, Heinzelman et al. [37] have been introduced the Low Energy Adaptive Clustering Hierarchy (LEACH) as the first dynamic clustering protocol on WSNs, which is still the basis of other advanced clustering mechanisms on wireless communications-based systems. LEACH consists of two specified phases: the setup and the steady-state. The setup phase starts when the wireless sensor nodes organize themselves into some clusters. Each wireless sensor node selects its cluster to communicate with the cluster-head efficiently. Then, the steady-state phase begins. All members of a cluster transmit their data to the cluster-head in a certain period. After receiving all data, the cluster-head delivers them to the sink. Finally, the current period is over, and a new one starts. Cengiz and Dag [38] have been presented the Improving Low Energy Fixed Clustering Algorithm (ILEFCA) to enhance the lifetime of WSNs.
The algorithm divides the wireless sensor nodes into some clusters, and then, a cluster-head is assigned to each of them. The clusters are constant over the network's lifetime, and the clusterheads change only when their energy is low than a pre-determined threshold. To determine the alternative cluster-head, the current one randomly selects one of its members as the cluster-head for the next round. To improve the efficiency of ILEFCA, the authors have been proposed another Energy-Aware Multi-hop Routing protocol (EAMR) [39]. The EAMR protocol consists of two main steps: the setup and the steady-state. In the setup step, each wireless sensor node produces a random number. If the number is smaller than a pre-determined threshold, it will be a cluster-head. At this point, each cluster-head assigns its closest sensor node as an alternative one and forms its cluster. Like the previous approach, clusters are fixed throughout the lifetime of the network, and data transfer from the wireless sensor nodes to the sink in a hierarchical manner. Finally, Salem and Shudifat [40] investigated the Enhanced Low Energy Adaptive Clustering Hierarchy (ELEACH) mechanism to deliver the sensed data from source nodes to the sink. The process of the ELEACH is similar to the LEACH, except that the distance between the wireless sensor nodes and the sink is also considered a parameter for cluster-head selection.
Some authors have been suggested fuzzy-based approaches for clustering, cluster-head selection, and data routing on WSNs to deal with the random events and overlapping parameters. In this case, Sundaran et al. [41] presented the Energy Conserved Unequal Clusters with Fuzzy logic (ECUCF) method for cluster-head selection on WSNs. In this method, each wireless sensor node selects a random number between zero and one. If it is smaller than a pre-determined threshold, the node will be a primary cluster-head. Then, the network is divided into three sections based on the distance between the wireless sensor nodes and the sink and their residual energy using fuzzy logic. At this point, each primary cluster-head calculates its competing radius based on its distance from the sink, residual energy, and information of other nodes and then broadcasts a message within its competing range. If the receiver nodes have less residual energy, they will be out of the competition. Finally, the wireless sensor nodes select and join the cluster-heads, considering the fuzzy inputs. Hamzah et al. [34] have been proposed the Fuzzy-Logic-based Clustering for Hierarchical Routing protocol (FLCHR) to decrease the energy consumption of WSNs. The protocol uses fuzzy logic for cluster-head selection, considering the number of active nodes as the distribution parameter. FLCHR model also exploits the wireless sensor nodes' residual energy, the distance of the nodes from the sink, and the density of the neighbors to prioritize the selected cluster-heads. Then, the authors used the Gini index [42] to evaluate the energy balance between wireless sensor nodes.
Since clustering and routing in WSNs are NP-hard problems, metaheuristic methods will be good ideas to solve them [43]. In this case, Suganthi et al. [44] have been suggested a backbone treebased technique to improve the fault tolerance and energy consumption on WSNs. It creates a spanning tree between the wireless sensor nodes so that each node has some primary and backup parents. If the primary parent of a node fails or exhausts its energy, the backup one is selected based on residual energy, distance, and the angle between the node and parent. Lakshmanan et al. [45] introduced another cluster-based routing scheme for heterogeneous WSNs to monitor the forests. In environmental monitoring-based applications, wireless sensor nodes have to be active for an extended period to collect data with high reliability. The scheme combines Particle Swarm Optimization (PSO) algorithm and K-Means clustering to determine optimal cluster-heads and routing from source nodes to the sink to deal with this challenge. Finally, Shivaraman and Mohan [46] have been proposed a hybrid energy-efficient mechanism for data routing on WSNs to reduce the energy consumption of wireless sensor nodes. It employs Ant Colony Optimization (ACO) and PSO algorithms to extend the lifetime of the network. First, WSN is clustered based on the residual energy of the nodes. A metaheuristic method, which is the combination of ACO and PSO, is used for efficient data collection.
The classical routing methods move towards multi-level hierarchical structures to avoid direct communication between the wireless sensor nodes and the sink. They are simple and have a small computation overhead. However, classical approaches do not combine critical parameters for cluster-head selection. So, they are not customizable to address the requirements of different applications on WSNs [47]. Also, these approaches do not properly balance the load over the network due to the lack of attention to the clusters' size and their distribution in the monitoring area. Accordingly, classical methods cannot enhance the lifetime of WSNs significantly [48]. The principal objective of the fuzzy-based techniques is to minimize the injected traffic into the networks. They are sophisticated and ignore useful parameters during the cluster-head selection and data routing [49]. The metaheuristic methods aim to minimize the energy consumption of wireless sensor nodes. Although these methods do not control the clusters' size and are centralized (because of high computational complexity), they offer suitable solutions for data routing from source nodes to the sink on WSNs [50]. Besides, metaheuristic methods optimally use multiple fitness parameters that influence network performance in a parallel manner. Thus, it is more efficient to exploit the capabilities of multi-objective metaheuristic methods for the data collection on WSNs.

Network Model
This section explains our network model, including the WSN and its parameters and an energy consumption pattern. The summary of the considered parameters is stated in Table. 1.

3-1. WSN Model
In this paper, the network is modeled as WSN=(N,L), where N and L are the wireless sensor node and communication link sets, respectively. The monitoring area dimensions are assumed H×W, where H and W are its length and width, respectively. The wireless sensor nodes are scattered randomly with uniform distribution [51] all over the monitoring area, and a sink is deployed in its center. It is considered that the wireless sensor nodes are homogeneous, i.e., they have the same initial energy, transmission range, and data priority. Furthermore, it is assumed that the network has the following features for more accurate modeling: • : The Euclidean distance between wireless sensor nodes and is illustrated by . It is one of the significant parameters to optimize the routing on WSNs. In more detail, the small Euclidean distance between each wireless sensor node and its cluster-head leads to low energy consumption. The short Euclidean distance between the cluster-heads reduces the energy required for data transmission from the cluster-heads to the sink. • : The hop-count of wireless sensor node to the sink is displayed by . Since the key objective of routing on WSNs is to transmit data from source nodes to the sink efficiently, the hop-count per wireless sensor node to the sink is one of the most critical parameters. • : The residual energy of the wireless sensor node is denoted by , at any moment. The wireless sensor nodes have specific initial energy consumed by sending/receiving packets during the working period of WSN. Each wireless sensor node will continue to operate until its energy is exhausted. Thus, the wireless sensor nodes' residual energy is one of the main parameters for clustering and routing on WSNs. It should be noted that the sink has unlimited energy at any time. Since it is assumed that the wireless sensor nodes are homogeneous, they have the same initial energy at the beginning of the network operation. • : The number of ith cluster's members is denoted by . One of the main goals of routing on WSNs is to balance the number of cluster members. Since all wireless sensor nodes send their data to the cluster-heads, the network load balance is disturbed if the members of a specific cluster are too high. Thus, some cluster-heads consume their energy earlier than others, and the network will lose its performance in vital applications. • : The number of cluster-heads that are deployed in the one-hop distance of the sink, is states (the number of cluster-heads that the sink is in their transmission range). This parameter is one of the major indicators for extending the lifetime of WSNs. Since all cluster-heads' purpose is to send the collected data to the sink with minimum hop-count, if the number of one-hop cluster-heads to the sink is high, the injected traffic is divided between them and so, the network lifetime is enhanced.
Finally, it is assumed that our proposed method groups the wireless sensor nodes into clusters. Also, the transmission range of the wireless sensor nodes is displayed with .

3-2. Energy Model
In this paper, we use the energy consumption model based on the Euclidean distance between the wireless sensor nodes [37,43]. The model states that the wireless sensor node can transmit data to , if their Euclidean distance is less than the transmission range of . It models the energy consumption as free space or multipath fading channel. If the Euclidean distance between the wireless sensor nodes is less than a threshold (D0= ), it employs the free space model.
Otherwise, the multipath fading channel model will be exploited. Accordingly, the model denotes that the required energy for sending an l-bit data over WSNs is calculated as (1) [37]: where Eelec illustrated the consumed energy by the electronic circuit. εfs and εmp denote the consumed energy by the amplifier in the free space and multipath fading channel, respectively. The energy consumption for receiving an l-bit data is defined as (2) [37]:

Our Proposed Mechanism
In this section, an efficient data collection mechanism on WSNs is proposed, called Integrated Routing based on Genetic Algorithm (IRGA). Figure. 1 illustrates the block diagram of the data collection process in the case of using our proposed mechanism. As shown in this figure, IRGA employs the Genetic algorithm to solve the clustering and routing problems on WSNs. First, the desired initial values are determined. One of the most critical initial values is the number of chromosomes or population size, which IPop displays. Besides, the maximum number of iterations and the size of Mainting Pool matrix are represented by Itmax and MPsize, respectively. after determining the initial values, the phases of the Genetic algorithm are used as follows: 1. Initial Phase: In the first phase of IRGA, the initial population of answers is randomly generated. Each chromosome of the initial population is represented by a one-dimensional array of zeros (wireless sensor nodes) and ones (cluster-heads). Then, the Genetic algorithm calculates the cost of each solution using the hop-count of the cluster-heads to the sink, the number of each cluster member, residual energy of cluster-heads, and the number of cluster-heads connected to the sink. Indeed, the problem is modeled as the hop-count of the cluster-heads to the sink, and the number of each cluster member are minimized, while the residual energy of cluster-heads and the number of cluster-heads connected to the sink remain as high as possible.
To calculate the hop-count of each cluster-head to the sink, IRGA uses a greedy approach, so that each cluster-head collects data from its cluster and then sends them to another cluster-head, which is closer to the sink. This process continues until the data of all wireless sensor nodes are delivered to the sink. Accordingly, our proposed mechanism integrates the clustering and routing process on WSNs.

Selection Phase:
In the second phase of IRGA, the probability of selecting chromosomes for transferring to the Mainting Pool matrix is calculated. The chromosomes with a higher chance (lower cost) are then transferred to the Mainting Pool matrix for offspring generation using the crossover and mutation operators. 3. Replacement phase: In the final phase of IRGA, a new initial population is generated, such that the chromosomes in the Offspring matrix are transferred to it, and other elements fill with the best solutions of the initial population. In the following, the main phases of IRGA are discussed in detail.

3-1. Initial Phase
In the first step of the initial phase, the initial population of solutions is randomly generated. Each solution is called a chromosome, which is consists of some smaller units (genes). The proposed mechanism makes IPop solutions, in which each of them is a one-dimensional array of zeros and ones. The genes that denote the wireless sensor nodes take the value of "0", and the genes that states the cluster-heads set to "1". The chromosomes' length corresponds to the total number of wireless sensor nodes in the monitoring area, i.e., each gene represents a wireless sensor node (it is assumed that each wireless sensor node has a unique identifier). An example of the initial population used in the IRGA mechanism is shown in Figure. 2. Figure. 2. An example of the initial population used in the IRGA mechanism At this point, the main challenge is to determine the number of ones and zeros in each solution. The IRGA considers that each cluster has one cluster-head. Therefore, the number of ones in each solution should be equal to the number of clusters. The number of clusters in the desired WSN is calculated by (1) [52].
After calculating the number of clusters, the k genes in each solution randomly take the value "1", and others set to "0". Then, the fitness function is used to calculate the cost of each solution. The IRGA mechanism employs the weighted sum of the hop-count of the cluster-heads to the sink, the number of each cluster member, residual energy of cluster-heads, and the number of cluster-heads connected to the sink as the parameters of the fitness function. Indeed, the problem is modeled as the hop-count of the cluster-heads to the sink, and the number of each cluster member is minimized, while the residual energy of cluster-heads and the number of cluster-heads connected to the sink remain as high as possible. Accordingly, the fitness function of our proposed mechanism for calculating the cost of chromosome c is defined as (2).
The first factor is to select the wireless sensor nodes as the cluster-heads, which have the least number of hop-count to the sink. Since the energy consumption of WSNs largely depends on the number of hop-count from the source nodes to the sink, considering this factor could improve the injected traffic into the network, required energy for data transmission, and WSN lifetime. In this regard, and are the best and the worst total number of hop-count from the source nodes to the sink, respectively. In the best case, all the cluster-heads are located in the same geographical coordinates of the sink, and the number of total hop-count to the sink will be zero ( = 0). In the worst case, each cluster-head has to trace others to deliver its data to the sink, and the number of total hop-count to the sink will be the number of all cluster-head for each data stream ( = ). It should be noted that our proposed model uses a greedy approach to calculate the number of hop-count from the cluster-heads to the sink, such that each cluster-head send its collected data to another one that is closer to the sink (the cluster-head with fewer children). This process continues until the data of all wireless sensor nodes are delivered to the sink. In other words, IRGA performs data routing while clustering the wireless sensor nodes.
The second factor attempts to distribute the wireless sensor nodes between the cluster-heads evenly. In the best scenario, | | wireless sensor nodes are divided into k clusters, and each cluster have | |/ members. To achieve a reasonable distribution, the difference between the number of each cluster member and | |/ must be minimized. If the wireless sensor nodes are distributed between the clusters evenly, the traffic load is spread over the cluster-heads, and the lifetime of WSN is enhanced significantly. In the best case, each cluster have | |/ members, and the difference between the number of each cluster member and | |/ will be zero ( = 0). In the worst case, all wireless sensor nodes are assigned to a cluster, and the total difference between the number of each cluster member and | |/ will be (| | − | |

).
The third factor's principal objective is to select the wireless sensor nodes as cluster-heads that have more residual energy. At the start of WSN, all nodes have a certain amount of energy (initial energy) consumed during data transmission. Since there is more workflow on the cluster-heads, if these nodes have more energy, they will expire later, and the lifetime of the network increases significantly. In the worst case, all cluster-heads consume their energy, and their remaining energy will be zero ( = 0). In the best case, all cluster-heads have their initial energy, and the total remaining energy will be equal to the sum of their initial energy ( = ( )).
The fourth factor aims to maximize the number of cluster-heads, which are located nearby the sink. Since the injected traffic into the near cluster-heads to the sink is much more than others, increasing the number of cluster-heads located in the one-hop distance of the sink can distribute the network traffic evenly and enhance its lifetime. In the worst case, no cluster-head is situated nearby the sink, and the number of cluster-heads that the sink is located in the one-hop distance of them will be zero ( = 0). In the best case, all cluster-heads are deployed nearby the sink, and the number of cluster-heads that the sink is located in the one-hop distance of them will be equal to the number of all cluster-heads ( = ).
The initialization phase's output is the initial population of solutions and their cost, which are stored in some arrays with the size of IPop to employ in the subsequent phases.

3-2. Selection Phase
After producing initial solutions and calculating their fitness, the selection phase generates the Mainting Pool and the Offspring matrixes. Since the data collection problem on WSNs is considered a minimization one, the selection probability of the chromosome to transfer the Mainting Pool matrix is defined as (3). IRGA aims to shift the solutions with a lower cost to the Mainting Pool matrix with a higher probability, but the more senior cost chromosomes also have little chance.
At this point, the main challenge is how to transfer solutions to the Mainting Pool matrix. To solve this problem, we exploit the Roulette Wheel Selection method [53]. It considers a wheel, which is divided into segments based on the selection probability of solutions. Then, a random number with a uniform distribution is generated. The number is deployed into the segment of each solution it is transferred to the Mainting Pool matrix. The process is repeated in MPsize times to fill all rows of the Mainting Pool matrix as the parents selected to move to the next generation.
Then, the IRGA produces appropriate offspring from the parents located in Mainting Pool matrix to fill the Offspring one. For this purpose, Genetic algorithms use crossover and mutation operations. We use the 1-point crossover operation to generate initial offsprings [54]. It generates a random number ( ) in [1, ] to produce the first and second offsprings from two first parents. The genes with the indexes less than are transferred from the first parent, and the genes with the indexes high than are transferred from the second parent to the Offspring matrix. To produce the second offspring from these parents, the genes with the indexes less than are transferred from the second parent, and the genes with the indexes high than are transferred from the first parent to the Offspring matrix. This process repeats for each pair of other parents in Mainting Pool matrix until the Offspring one is filled. It should be noted that the Offspring matrix's size is the same as Mainting Pool matrix in our proposed mechanism.
After applying the crossover operation, some of the offsprings may not satisfy the constraints of the problem, i.e., the number of genes with the value "1" is less/greater than the number of clusterheads. To address this challenge, a repair function is defined. If the number of "1" in each chromosome of Offspring matrix is greater than the number of cluster-heads, the function randomly assigns "0" to some of the genes. Otherwise, if the number of "1" in each chromosome is less than the number of cluster-heads, the function randomly assigns "1" to some genes.
In the final step of the selection phase, the chromosomes of the Offspring matrix are modified using the mutation operation, such that a random number ( ) is generated for all genes of a chromosome. Then, for each gene, another random number ( ) is also produced. If the value of is less than , the value of the gene is changed. After applying the mutation operation, some of the chromosomes may not satisfy the problem's constraints. Thus, we exploit the repair function again. Finally, for all the Offspring matrix solutions, the cost value is calculated to be used in the next phase.

3-3. Replacement Phase
In the replacement phase, a new population matrix is generated, in which its size is the same as the initial population. For this purpose, the chromosomes of the Offspring matrix are transferred to a new matrix, and other chromosomes are filled by the best solutions of initial population (It is assumed that the number of chromosomes in the Offspring matrix is less than/equal to the number of chromosomes in the initial population). Finally, the end condition of the algorithm is checked:  If the iterations of the Genetic algorithm have reached Itmax, the best solution (chromosome with lowest cost) is returned as the final solution of the problem.  Otherwise, the algorithm returns to the Selection Phase.
The output of the IRGA mechanism is an efficient routing over WSN. Algorithm. 1 illustrates the proposed mechanism, which customizes the Genetic algorithm to solve the data collection on WSNs.
Algorithm. 1. The process of IRGA mechanism Inputs: IPop, Itmax, MPsize Output: The set of efficient routings 1 Begin 2 Generate initial population 3 Calculate cost of each chromosome in initial population 4 while Iteration< Itmax 5 Calculate probability for each chromosome in initial population 6 Generate Mating Pool 7 Generate Offspring matrix 8 Generate new population 9 end 10 Return the efficient set of routings 11 end 12

Performance Evaluation
In this section, the simulation results of IRGA are compared to the recent state-of-the-art data collection approaches on WSNs to prove our proposed mechanism's superiority. First, we explain the simulation parameters and considered values. The important performance factors on WSNs are then analyzed, including energy consumption, the number of live nodes, lifetime, and the number of hop-count.

Simulation Setup
To analyze the performance of IRGA, the dimensions of the monitoring area are assumed 250 × 250 , in which a sink with unlimited energy is deployed in the center of it. Also, 200 − 500 homogenous wireless sensor nodes are distributed randomly with uniform distribution all over the monitoring area. The initial energy and transmission range of the wireless sensor nodes are considered 2 and 40 , respectively. For the accurate simulation, we considered CC2420 characteristics with transmission speed 250 / for sending/receiving data in each wireless sensor node [29]. Besides, εfs and εemp are 10pJ/bit/m2 and 0.0013pJ/bit/m4 all over the simulations, respectively. Finally, we employ MATLAB R2018a in a computer with the Windows 10 operating system, Intel (R) Core (TM) i5-3520M CPU @ 2.90GHz and 8GB RAM for all simulations.
The efficiency of IRGA is compared with two recent data collection approach on WSNs. The first one, which is called Enhanced Low Energy Adaptive Clustering Hierarchy (ELEACH), has proposed an energy-efficient clustering-based mechanism for data transmission from source nodes to the sink [40]. The second one, called Virtual Backbone Tree-based Routing (VBTR), has been provided a spanning tree between the wireless sensor nodes for efficient data collection based on metaheuristic algorithms [44]. It is worth mentioning that all simulations run 100 times to minimize errors. The round is defined as when the wireless sensor nodes sense data from the monitoring area until the result ones are delivered to the sink. Finally, , , , and are assigned with the same value of 0.25 to obtain consistent results for vital applications on WSNs.

Energy Consumption
One of the significant factors in the performance analysis of data collection mechanisms on WSNs is the energy consumption scheme for data transmission from source nodes to the sink. The energy consumption of a WSN refers to the total energy exhausted by all wireless sensor nodes distributed all over the monitoring area. Figure. 3 (a) demonstrates the IRGA, ELEACH, and VBTR mechanisms' energy consumption via increasing the number of rounds in a network with 250 nodes. As shown in this figure, our proposed mechanism reduces the energy consumption of WSNs compared with others. In more detail, in the case of using IRGA, the network's energy consumption is improved by 7.67% and 40.75% on average, compared to ELEACH and VBTR data collection mechanisms, respectively. Figure. 3 (b) illustrates the IRGA, ELEACH, and VBTR mechanisms' energy consumption via increasing the number of rounds in a WSN with 450 nodes. As shown in this figure, IRGA decreases energy consumption in the scenario of 450-node WSN, too. In numerical terms, in the scenario of using our proposed method, the energy consumption of WNS is improved 12% and 37.71% on average, compared to ELEACH and VBTR ones, respectively. The IRGA data collection mechanism is superior to the others due to exploiting artificial intelligence and suitable parameters in the fitness function setting. Thus, our proposed mechanism is a suitable approach for data collection in energy-constraint applications of WSNs regardless of network size.

Number of Live Nodes
The number of live wireless sensor nodes over time is another main factor in analyzing data collection mechanisms on WSNs. It demonstrates the load balancing over the network, energy consumption fairness, and operational time of the network. Figure. 4 (a) illustrates the number of live wireless sensor nodes in the scenarios of using IRGA, ELEACH, and VBTR mechanisms in 250-nodes WSN via increasing the rounds. This figure shows that using the ELEACH and VBTR mechanisms, the first wireless sensor node exhausts its energy at 396 and 123 rounds, respectively. In contrast, in using IRGA one, the network loses the first node in 576 round. In other words, our proposed mechanism improves the number of live wireless sensor node 1.45 and 4.68 times on average, compared with ELEACH and VBTR ones, respectively. Figure. 4 (b) shows the number of live wireless sensor nodes using the exact mechanisms in a WSN with 450 nodes over time. This figure illustrates that in the scenarios of using the ELEACH and VBTR mechanisms, the network loses the first wireless sensor node at 169 and 89 rounds, respectively while using our proposed one, the first node expires at 382 round. In other words, IRGA mechanism enhances the number of live wireless sensor node 2.26 and 4.29 times on average, compared with ELEACH and VBTR ones, respectively. The overall analysis of Figure. 4 demonstrates that our proposed model has a reasonable performance in analyzing the number of live wireless sensor nodes in WSNs with different sizes, compared to the others, due to exploiting the characteristics of swarm intelligence techniques.

Lifetime
Network lifetime is another critical factor in analyzing the performance of WSNs, which is defined as the number of rounds between the time that the network is starting its production and the time in which a specified percentage of primary wireless sensor nodes are alive. Figure. 5 shows the average lifetime for WSNs with 200 − 500 wireless sensor nodes via increasing the network's size. It should be noted that we consider the death of the first wireless sensor node as the network lifetime in our simulations. Our proposed mechanism outperforms the ELEACH and VBTR ones by 1.86 and 3.92 times on average in any network with any size, respectively. Since IRGA use the hop-count of the cluster-heads to the sink, the number of each cluster member, residual energy of cluster-heads, and the number of cluster-heads connected to the sink as the fitness parameters of the data collection process, it yields an acceptable energy exhaustion fairness. It leads to a reasonable lifetime, compared with other approaches.

Number of Hop-count
The number of hop-count from wireless sensor nodes to the sink is one of the essential factors to evaluate routing techniques on WSNs. Since our proposed mechanism's final objective is to energy-efficient data transmission, analyzing the average number of hop-count can demonstrate the effectiveness of the data collection approaches. Figure. 6 illustrates the average number of hop-count from source nodes to the sink for WSNs with 200 − 500 nodes. The figure shows that the IRGA mechanism decreases the average number of hop-count from wireless sensor nodes to the sink by 6.83% and 23.21% on average compared to the ELEACH and VBTR mechanisms.

Conclusion
Data collection is one of the significant issues in WSNs. Data routing is the fundamental challenge to satisfy the requirements of energy-aware data collection in such networks. This paper proposes a novel mechanism to cluster the wireless sensor nodes and determine the efficient route from source nodes to the sink. We exploit the Genetic algorithm to solve the routing problem considering the hop-count of the cluster-heads to the sink, the number of each cluster member, residual energy of cluster-heads, and the number of cluster-heads connected to the sink as the fitness criteria. Our proposed mechanism uses a greedy approach to calculate the hop-count of each cluster-head to the sink for integrating the clustering and routing process on WSNs. Performance evaluations verified that our proposed mechanism improves the energy consumption, the number of live nodes, and the lifetime of the network compared with other state-of-the-art data collection approaches on WSNs.
Declaration I wish to confirm that there are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome. I confirm that the manuscript has been read and approved by named author and that there are no other persons who satisfied the criteria for authorship but are not listed.
I confirm that I have given due consideration to the protection of intellectual property associated with this work and that there are no impediments to publication, including the timing of publication, with respect to intellectual property. In so doing I confirm that I have followed the regulations of our institutions concerning intellectual property. I also emphasize that the present publication has not been published anywhere and is not under review in another journal.