NaISEP: Neighborhood Aware Clustering Protocol for WSN Assisted IOT Network for Agricultural Application

Smart farming is becoming the need of the hour nowadays in an effort to boost productivity and protect the crops. This can be done using sensors and internet enabled devices in the farm. The underlying wireless sensor network can sense various environmental parameters and can pass data to the internet enabled devices to support smart farming. The sensor nodes are, however, powered by smaller batteries and have limited lifetime. Therefore, this paper presents a clustering protocol which aims at increasing the lifetime of the sensor nodes. These sensors are considered to be pressure sensors which are deployed in the network and whenever any animal encroaches the farm, a signal can be passed to the internet enabled alarm system which can help the farmer to fend off the animals and protect his crops. The sensor network is considered to have three level of energy heterogeneity among the nodes and the cluster head is selected in such a way that the cluster formed by the head consists of more number of high energy nodes. The proposed protocol has been simulated in MATLAB environment and compared with Stable Election Protocol (SEP), Distributed Energy Efficient Clustering (DEEC) and Improved Stable Election Protocol (ISEP) based on network lifetime and throughput. The network lifetime for proposed protocol was 6567 rounds which was higher than others with network going dead at 5472 for ISEP, at 2495 for DEEC and at 2316 for SEP. The protocol has shown better performance against these existing protocols.


Introduction
Internet of Things (IoT) is coming of age technology and it has penetrated in multiple spheres of life with its numerous applications ranging from intelligent transportation system, smart environment, smart home, hospitals and agriculture [1,2]. IoT is collection of large number of smart devices such as sensors, smartphones, radio frequency identification (RFID) tags, internet enabled alarm systems and so on which coordinate with each other over the internet [3,4]. These IoT devices when coupled with wireless 1 3 sensor network (WSN) creates a number of applications where various information models coordinate mutually to provide a common service [5][6][7]. WSN is a network which consists of numerous sensor nodes deployed randomly or deterministically in an environment to gather some useful information [8,9]. The sensor nodes in WSN sense the data from the environment and pass it to the internet enabled devices which can then take some useful actions based on the received data. Such networks are termed as WSN assisted IOT networks and they found their usage in multiple applications such as smart home, agriculture etc. When it comes to agriculture, various sensor nodes deployed in the farm can sense various kind of data such as soil moisture content, temperature level, humidity level etc. The data sensed by the nodes can be passed onto the IoT enabled valve in order to maintain the soil moisture level or signal can be passed to internet enabled alarm system in case of any sensing of animal activity in the farm. Thus, smart farming can assist the farmer in producing high quality crops, protecting the farm from the animals, reducing the wastage of water or fertilizer etc. [10].
However, the backbone to the IoT device i.e. the underlying sensor network has some major issue relating to the sensor nodes. These nodes are powered by smaller batteries, thus having limited resource for sensing the data, processing and communicating it [11]. This leads to lesser network lifetime and if their energy is not preserved or saved the entire motive of the network fails. In order to save the batteries of these sensors and to increase network lifetime, the clustering approach has been put forward by various researchers in the past [12][13][14][15][16].
In clustering, the nodes are organized into groups called clusters and each cluster has one cluster head and other nodes serve as cluster members. The cluster members perform the function of sensing and gathering data from the environment and transmitting it to their respective head. The cluster head aggregates all the data and forwards it to the base station for final processing of the data. The concept of clustering originated from Low Energy Adaptive Clustering Hierarchy (LEACH) protocol which selected the cluster heads randomly. Another protocol, Stable election protocol (SEP) has also been introduced in the past that focuses on networks having nodes with different energy levels. It prioritizes the nodes having higher energy to become cluster head [17]. Since then many of the clustering protocols have been developed which focus on optimal selection of cluster heads in the network with an aim of maximizing network lifetime; while some other protocols focus on optimizing the method by which data can be transferred to the base station from cluster heads in an energy efficient way. These protocols have been developed either considering the homogeneous or heterogeneous environment. This paper presents clustering protocol that is established for heterogeneous environment (with nodes having different energy level) and nodes have been deployed both deterministically as well as randomly. The cluster head selection procedure has been optimized by considering the local environment of the node and most importantly the protocol has been developed for agricultural application where the pressure sensors have been considered; the data collected by them about the animals encroaching the farm will help the farmer/guard to protect their crops. Section 2 of the paper highlights the related work in the field of clustering of WSN assisted IoT networks which aim at augmenting the network lifetime; motivation of the current study has also been discussed in this section. Section 3 represents the system model and describes the deployment scenario of the nodes. The protocol has been detailed out in Sect. 4 and consequently results have been discussed in the next section. Finally Sect. 6 concludes the paper.

Related Work and Motivation
T.M. Behera et al. [18] proposed the concept of avoiding the energy consumed in cluster formation by introducing the concept of threshold value for SEP protocol. The author proposed that if any node has remaining energy more than the threshold value, then it can be retained as cluster head in the successive rounds as well. T. Sood et al. [19] proposed the concept of lines of uniformity for optimal cluster head selection with an aim of improving the network coverage area and minimizing the number of isolated nodes in the network. Lines of uniformity are the lines drawn diagonally in the network such that nodes which are closer to these lines have higher probability to become cluster heads. V. Pandiyaraju et al. [20] has proposed the energy efficient routing protocol for agriculture done over the terrains. The entire area has been divided into grids and for energy efficient routing, the terrain head from every gird is chosen based on the fuzzy rules and they transmit data to base station using multi hop communication where the intermediate nodes are selected again using the fuzzy rules. T.H. Feiroz Khan et al. [21] proposed mobile sink based data gathering protocol for ambient monitoring of crops. Four different types of trajectories have been proposed in the research work which are used by the sink node for data gathering from sensors. Furthermore, optimal routes are selected based on frontward communication area to reduce energy consumption and delay in the network. D. Mehta et al. [22] selected the optimal cluster head based on multiple parameters to achieve energy efficiency. Furthermore, instead of using single hop transmission from cluster head to base station, the sail fish optimizer has been used to form optimal path for data transmission. Another clustering protocol that utilizes the concept of data transmission using multiple paths has been proposed by C. Mohanadevi et al. [23]. In this, the clusters have been formed using hybrid Particle Swarm optimization (PSO) and cuckoo search optimization. H. Singh et al. [24] proposed clustering protocol for three level hierarchical heterogeneous sensor network, where the network has been divided into two regions and both regions have different methods to select cluster heads based on LEACH and SEP. However, the concept of mobile agent has also been explored to gather data from the cluster heads. Arikumar et al. [25] proposed the use of cluster router node which act as another hierarchy level with the aim of forwarding data (which is gathered by cluster head) to the base station. The cluster head as well as cluster router are selected using the fitness value derived using PSO and fuzzy rules. Another use of three level hierarchy system has been proposed by Ahmed Elsmany et al. [26]. They have made use of congregator nodes as an additional layer apart from cluster head and cluster members which are used in traditional clustering protocols. This is used to minimize the load over the cluster heads as congregator nodes are assigned the task of aggregating the data from cluster members and then pass it to the respective cluster head. Behera et al. [27] proposed R-LEACH clustering protocol which uses the concept of choosing the group of cluster heads within the same cluster instead of using single cluster head. The objective of this approach is to rotate the cluster heads within the elected group only such that energy consumed in cluster formation in every round can be curbed. Furthermore, the authors have considered only the nodes having higher energy to be selected in cluster head group for enhanced network lifetime.

Motivation
It has been observed that most of the clustering protocols developed so far do not consider any specific application like the use of IoT enabled network in specific agriculture application or in any military application etc.; they are more of generalized protocols. Many clustering protocols have been developed for heterogeneous networks as well where the cluster heads are usually selected based on energy of the nodes; this gives a chance for the advanced nodes to become head. These protocols however focus less on the other parameters of the nodes such has their proximity to the base station or the kind of neighborhood (neighborhood can be defined as nodes in the communication range and kind of features they possess like residual energy or their location etc.) a node possess etc. while considering candidate node for head selection. Therefore, considering these gaps the proposed work aims at developing a protocol for specific agricultural application where the pressure sensors have been deployed in the farm in a deterministic pattern to raise the IOT enabled alarm system whenever any animal encroaches the farm. Also, the neighborhood of the node have also been considered in the proposed protocol for optimal selection of the cluster head in the network.

Contribution
The paper provides a heterogeneous wsn model for agriculture application which primarily focuses on protecting the crops from the animals that may encroach the farm and damage them. The main contribution of the paper is related to the cluster head selection process which incorporates the analysis of neighboring nodes with respect to their energy levels.
The assessment of quality of the neighborhood is rarely seen in the existing studies; therefore the concept of h-factor explained in the paper is a novel concept. This will allow the formation of such clusters which are very strong in context of energy levels. Therefore, the network lifetime is expected to be on the higher side. Furthermore, the eligibility of the nodes is computed based leadership tenure which is another new concept proposed in the paper. This tenure determines the lifetime of the node in terms of number of rounds which enables us to elect the cluster heads which can run for longer duration of the network. Both these factors contribute to novelty in this paper.

System Model
• Node type: The network consists of three types of nodes -nodes having highest energy, i.e. advanced nodes, nodes having energy a bit less than the advanced nodes, i.e. intermediate nodes and normal nodes. Since the protocol focuses on the agricultural application where the farm guards can be alerted upon the incident of animal infiltration in the fields, therefore the type of sensor used in the network will be pressure sensors.
The moment any animal steps over the sensor, the guard can be alerted about the same. Since, the normal have different initial energies, therefore we call this as heterogeneous environment. All the other resources of the nodes except the energy are same. • Deployment Scenario: The network area is considered as rectangular shaped with the advanced nodes deployed in the outer perimeter as shown in Fig. 1 below in green color. We call this area as layer 1. The reason behind deployment of advanced nodes in the layer 1 of the farm is based on the assumption that whenever the animal breaches the fence and enters the field, there are very high chances that it may step up on one of the sensors. Therefore, the highest chances of reporting this event to the base station will be from this area of the network only. Since, these sensors will be transmitting maximum data or reporting maximum events, therefore their energy needs to be highest among other nodes. These node have times more energy than normal nodes.
-In the next layer of field, the intermediate nodes shown in blue color will be deployed. This part of the field has second least chance of reporting the animal infiltration event to the server (assuming that most of the animal filtration will be detected by the front layer sensors only). These node have times more energy than normal nodes such that = ∕2. -In the last layer of the network, the normal nodes are randomly deployed which are shown in the pink color in the figure below. Since, their distance to the base station (deployed in the center of the network, shown in yellow color heptagon) is least so their energy resources can be traded off.
• Energy model: Whenever a node forwards or receives any packet to or from another node, some amount of energy is dispensed which is governed under first order radio energy model [28]. Whenever, the data communication is within the clusters, the energy consumed by the members in sending the data to the cluster head or any message passed on by the cluster head to its members is directly proportional to d 2 . The energy consumed to transmit a packet over a longer distance, particularly from cluster head to base station is directly proportional to d 4 . The energy consumed in packet transmission can be given as: P s is the packet size, E elec , E amp , E fs are the parameters related to electronics, amplifier and free space energy respectively. The energy consumed by the receiver is independent of the distance between two nodes. The energy consumed is given by:

Proposed Protocol: NaISep (Neighborhood Aware Improved Stable Election Protocol)
The proposed protocol is based on clustering of the nodes with an objective of increasing the network lifetime. The clustering protocol is based on SEP and works in two phases. In the first phase the cluster head selection is done among the deployed nodes and cluster is formed by the selected cluster heads. In the second phase, the cluster head aggregates the data from the members inside the cluster and forwards it to the base station. These two phases are explained below in detail and the variables/notations used are termed in the Table 1.

Cluster Head Selection and Cluster Formation Phase
Heterogeneity Factor (h-factor): In the proposed environment, the heterogeneity factor is defined as the ratio of advanced plus intermediate nodes in communication range of the node to total number of neighbors. Since the nodes deployed in the outer parameters of the farm (advanced and intermediate) are likely to report more animal, therefore a node having more number of such nodes within its range will have more h-factor. The cluster thus formed with such nodes will be sending more messages to the base station. Mathematically, in a network having a total of 'n' nodes and 'k' number of optimal clusters, the neighbor set for sensor 's i ' can be defined as: Then, A node having higher value of h factor will be having higher probability to become cluster head in the current round.
Leadership Tenure: The cluster formation phase begins after the optimal cluster head has been elected. The elected cluster head has to broadcast an advertisement (ADV) packet in its communication range so that neighbors can know of its election as head and can form cluster with it. This broadcast of ADV packet comes at an expense of increased overhead as well as energy resources. Furthermore, at the end of every round, the cluster head needs to be rotated so that the load among the cluster head and the members can be balanced out. Therefore, with every new cluster head in every round, the broadcasting process for cluster formation becomes inevitable leading to wastage of resources. In order to lessen it, the proposed work introduces the concept of leadership tenure. It is defined as the number of successive rounds for which a node will be retained as cluster head without changing its role. Higher value of leadership tenure would mean that the cluster head will not be changed in the successive rounds, thus leading to reduction in the consumption of resources required in cluster formation process.
For computing the leadership tenure, the threshold value of energy needs to be computed (this threshold value of energy is that energy level which will decide if present energy level of the current cluster head is suitable enough for the node to carry on the present role in the next round as well). Suppose, a node elected as cluster head has 'n/k' cluster members. Cluster head expends energy in formation of clusters and then in transferring data to the base station. These two parts of energy can be computed as: where E CF represents energy consumed by head in cluster formation process, CP s size of the control packets used in the cluster formation process.
The first part of the equation CP s * E elec + CP s * E fs * d 2 represents energy consumed when head broadcasts ADV packets to the nodes informing them of its election as cluster head and second part of the equation n k − 1 * CP s * E elec + E bf represents the energy consumed when the head receives the JOIN messages from the members. These JOIN packets are sent by the nodes to their cluster head to inform the head that they are joining its cluster. The above equation can be simplified as: The second part of energy consumption when head aggregates the data and forwards it to the base station can be computed as: where E DT represents energy consumed by head in data transmission process, DP s size of the data packets and DP s > CP s The first part of this equation DP s * E elec + DP s * E amp * d 4 shows the energy consumed by clust*er head in forwarding the aggregated data to the base station and it is assumed that communication with base station is over a longer distance. The second part of the equation + n k − 1 * DP s * E elec + E bf shows the energy consumed by the cluster head in receiving and aggregating data from the n k − 1 cluster members. The above equation can be simplified as: Therefore, total energy E T round consumed in cluster formation and data transmission in one round is: The threshold level of the energy (T.M. Behera et al. 2020) for 'i th ' node having initial energy as E 0 i can be computed as: whereE TX∕byte is the energy spent in transmitting packet of size 1 byte. P STx , P SRx represents the size of transmission and reception packet respectively.
The extra energy available with a node before its energy falls below or becomes equal to threshold value is: The number of rounds for which a node may sustain its role as cluster head, i.e. Leadership Tenure can be then computed as: The fitness value of the node to become cluster head can be computed using Heterogeneity Factor and Leadership Tenure as: where w1, w2 are constants with their sum equal to 1 More fitness value would mean that the node having higher number of advanced and intermediate nodes in the communication range (i.e. more value of Heterogeneity Factor) and more number of rounds for which it can be sustained as cluster head without rotating it (i.e. more value of Leadership Tenure) is the optimal candidate for selection of cluster head.
Once the fitness value of the nodes is computed, each node generates a random number and compares it with threshold value. If the generated random number is less than threshold value, the node becomes cluster head. The threshold value for the respective nodes (advanced, intermediate and normal nodes) depends upon their probability to become cluster head. The threshold value and probabilities can be computed as: where This marks the end of cluster head selection process and in the next step the selected cluster heads begin the cluster formation by broadcasting ADV packet to their neighboring nodes. The neighboring nodes may receive the packet from single cluster head or from set of cluster heads. In the former case, they simply join the cluster head from which packet has been received and in the latter case, they decide to join the cluster head having the least distance or maximum received signal strength.

Data Transmission Phase
Once the clusters have been formed, the last step is the data transmission to the base station. In this step, the cluster heads first send TDMA schedule to each node in which they can forward the data to the cluster head. The data in this proposed work comes from the pressure sensors which report the infiltration of the animals in the farm. Whenever any animal breaches the fence and steps over any sensor, the node can intimate the cluster head of the activity by sending a data packet. The cluster heads will aggregate the packet from the sensor nodes. However, if there is no activity noted by the sensors, they will send a packet to the cluster head which will contain only 1 bit having value set as 1 in the data payload. This is done to make sure that the sensors are alive and in activated state. Now, the cluster heads need to forward the data to the base station where it can be processed and after the data processing a signal can be passed to activate the alarm system. This alarm system will help the farm guards to fend off the animals from the farm.

Results and Discussion
The proposed protocol was simulated in MATLAB environment. The nodes were deployed in the network area of 100 * 100 m 2 with the base station positioned at the center of the network. The other simulation parameters have been shown in the Table 2 below. The performance of the proposed protocol was compared with ISep, Sep as well as with DEEC, since these three protocols are heterogeneous as well. The performance was analyzed based on network lifetime and throughput of the network.
The above Figs. 2 and 3 shows the network lifetime achieved for different protocols under the simulation scenario when network had 10% of advanced nodes and value of α = 1 and 2 respectively. Network lifetime is computed from the number of nodes alive in the network in a particular round. Initially, all the nodes are alive and the graph starts from value 100 which is equal to the number of nodes deployed in the network. As number of rounds increase, the nodes start consuming the energy and eventually when the entire batteries get depleted, the dead nodes start to increase. This is evident from the fall in the above graphs. When the last node dies out, that particular round denotes the network lifetime. It was found to be minimum for SEP and DEEC where the nodes died out early as compared to ISEP and NaISEP. This value was 2316 rounds for SEP, 2495 rounds for DEEC, 5472 rounds for ISEP and 6567 rounds for NaISEP. This is because both SEP and DEEC have two level of heterogeneity whereas ISEP and NaISEP had three levels of heterogeneity. SEP and DEEC protocol elects the cluster head based only on the remaining energy of the nodes without considering the parameters of the neighboring nodes. This may lead to formation of clusters having unbalanced distribution of energy. As far as ISEP and NaISEP are concerned, the latter one performed better than the former one; this is because the cluster head selection in the NaISEP protocol has been done considering the local environment of the node. This allows such a cluster to be formed in which more number of nodes have higher level of energy, i.e. more value of h-factor which leads to increased cluster lifetime. Furthermore, the cluster head node being elected has higher value of leadership tenure which leads to avoidance of energy consumed in broadcasting of ADV packets during the cluster formation phase. This concept therefore, not only increases the network lifetime but also the network stability period. The stability period is the one where first node goes dead in the network. The stability period for SEP was 1822 rounds, for DEEC was 1981 rounds, for ISEP was 3712 rounds and for NaISEP was 3907 rounds. The value of all the protocols obtained when the first node had died out and when the last node died is given in Table 3. The below Figs. 4 and 5 shows the value of throughput achieved for different protocols under different simulation scenarios. The throughput is defined as the number of packets which are being sent to the server or the base station by the deployed nodes in the network. It was observed from the graph of network lifetime that number of nodes alive were most for the proposed protocol followed by ISEP and then SEP and DEEC. This means that these alive nodes had sent more packets to the base station which eventually increases the throughput of the network as well. Therefore the graph for the throughput is in line with the graph of network lifetime. However, it was also observed that when we increased the amount of energy given to the advanced and intermediate nodes, they were able to sustain themselves in the network for a longer duration of time; they were able to send more packets to the base station, thereby resulting in increased throughput as well. As long as the all the nodes in the network are alive, the throughput for every protocol shows a gradual increasing slope, as the nodes start dying out, the throughput value starts decreasing which eventually becomes constant when the entire network goes dead.

Conclusion
WSN nowadays provide an immense sphere of applications when combined with IoT devices. The paper presents a clustering protocol which aims at increasing the lifetime of the sensor nodes. The main aim of the designed protocol is in the context of agriculture application where the type of sensors considered are pressure sensors; they are deployed both deterministically as well as randomly in the network such that if any animal steps over them (by encroaching the farm), they could activate the internet enabled alarm system which can help the farmer to fend off the animals. This smart agriculture application will help the farmer to save his crops from animals. The proposed protocol is an extension to ISEP clustering protocol; the cluster heads have been selected based on the local neighborhood of the nodes as well as leadership tenure. The performance was compared based on network lifetime as well as throughput with other protocols-SEP, DEEC and ISEP. The proposed protocol has shown increased network lifetime as compared to the others and this in turn augments the throughput of the network as well. The use of mobile agent can be explored in future which can play the role of data collector from the cluster heads.
Author Contributions All authors contributed to the study conception and design. Material preparation ana analysis was performed by Vatan Sehrawat. The draft of the manuscript was written by Vatan Sehrawat. All authors read and approved the final manuscript.
Funding The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.

Data availability
No dataset is applicable to this work.
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.