Binary Tree Based Data Gathering Routing Scheme for Wireless Sensor Networks


 Wireless sensor network (WSN) is an emerging area in networking since the era of 21 st century. The major benefits of WSN using sensor nodes make it affordable, scalable, economic and reliable. The limitations of sensor nodes are in terms of fixed and limited power supply, durability, storage and computational facilities which make energy as a vast challenge in deploying sensor nodes in order to prevent them from draining. This paper proposes a novel deployment scheme for connecting the sensor nodes in the form of a 4-sided virtual full binary tree structure. In the proposed scheme, data is expected to reach resource opulence Base Station (BS) via hops as equal to the height of the tree. Also, the stability of the network will increase by an average value of around 82.78% in the range of 49- 98% with existing scheme of the network lifetime with respect to different scenarios. The proposed scheme gives excellent results with a variable number of nodes and changing the size of deployment area of WSN.


Introduction
Wireless Sensor Network (WSN) is a network of sensor nodes. These sensor nodes have the capability to transmit the gathered data from the monitored field through wireless link toward a resourceful Base Station (BS). Depending on the application type, the BS can be located either within the field of interest or at a distant place from the sensor field [1]. The BS and/or sensor nodes can be static or dynamic as per application.
In WSN, clustering is one of the best methods of grouping the sensor nodes along with the cluster head (CH). These nodes operate simultaneously in isolation with other cluster nodes. Clustering supports network scalability and reduces the size of routing tables; which are maintained at each sensor node to boost the network lifetime [2][3].
Data collected by sensors is usually transmitted via multi-hop communication to the sink node. This operation is named converge-cast [4][5], which uses many-to-one communication model; where a typical ad-hoc network uses a many-to-many communication model. Therefore, routing protocols of ad-hoc networks are inefficient for WSN. Hence, the tree structure is preferred by many routing protocols due to its inherent characteristics [6][7]. In WSN structure, most applications have one sink and multiple sender nodes.
Tree-based routing is used in many protocols with an advantage that for not maintaining the routing structure on each sensor node. Every node is just required to aggregate and transmit the data to its parent node [8]. So, there is no requirement of route discovery mechanism. The most appropriate structure for opting TDMA scheduling in WSN's are trees [6].
Shortest path tree can be constructed by connecting sensor nodes with least cost which gather data at the root node; which is either BS or node that transmits data to BS. By virtue of this structure, nodes consume the least energy, thus improving lifetime the network [6][7][8].
In LTP [9], labels were assigned to nodes by building a tree through a depth-first search on the network. Lots of research has been done in factors like balancing the tree to distribute the load among nodes [10] and using specific metrics (residual energy, link quality, proximity to the sink) to construct the routing paths [11] which in turn helps in improving the network lifetime. The efficiency of tree construction algorithms is also measured by the runtime and the number of messages exchanged among the nodes [12].

Literature Survey
LEACH [13] protocol distributes load evenly to all nodes of the network by connecting them in a cluster. In addition to CH rotation, received information is processed locally, aggregated which in turn transmitted to BS for further processing. This protocol achieves eight times better lifetime over existing techniques.
TEEN [14] protocol was based on energy variation defined by Hard threshold (HT) and Soft threshold (ST) values. The nodes keep on sensing the environment continuously. It transmits when conditions meet the HT criteria and after transmission based on the ST, any change in the environment with respect to ST would be intimated.
PEGASIS [15], focused on improving the lifetime of the sensor network by forming a chain among sensor nodes so as to transmit data to the next node in the path towards BS with least cost; forming a flat routing protocol.
A routing protocol PEDAP [16], based on the minimum spanning tree generation is governed by BS for gathering network data. After a certain interval, BS re-creates the routing information along with discarding dead nodes.
Liu and Jin [17] divided the sensor field in tiering structure based on the distance of nodes from BS. Nodes with higher tier IDs used multi-hop transmission to transmit data to BS via lower tier Ids.
MSMTP [18] reduces energy usage in each round by constructing Minimum Spanning Tree with tiering along with defining a threshold energy level. The focus was on equal load distribution on the nodes.
Hassanzadeh in [19] concerns with the mobility of the sensor nodes as well as of the sink in a particular direction for improving the packet delivery ratio.
A centralized, uniform cluster tree routing structure was proposed to reduce the transmission distances, where the energy of each sensor node was accounted for selecting them as CH. In addition, it reduces the transmission distance by multi-hop approach [20].
Scheme was also proposed in [21] to create a minimum spanning tree among sensor nodes to transmit data to BS via relay (higher energy) node, and it was able to achieve 50% better lifetime. But this was compromised with the delay for reaching data to BS.
A minimum spanning tree scheme was proposed by Sudarmani and Vanithamani in [22] for clustered Heterogeneous Sensor Networks with Mobile Sink, which moves around the circular path around the network and data reaches to sink in 5 hops.
Zhu and Zhang proposed a data gathering scheme named BTDGS [23] based on the construction of virtual binary-tree infrastructure. The mobile sink moves along a predefined circle trajectory along with broadcasting its information. Nodes receiving the sink information inform their parent and same area nodes which in turn forwards the message after calculating the direction of the receiver.
In [24], Bagga proposed Cluster Tree based Data Dissemination protocol by forming the virtual tree structure in the clusters. It tries to balance the load among all sensor nodes along with taking care of the timely delivery of the data. E 2 R 2 [25] was a cluster based hierarchical routing protocol, which was constructed by forming CH panel. The data transmission from the CH node to the BS was carried out either directly or in multi-hop fashion along with the option to choose alternate paths.
Extended Shortcut tree routing protocol [26] was proposed by Wadhawa by eliminating routing tables, focusing for improvements in PDR and end to end delay.
The Shortest Hop Routing Tree protocol (SHORT) [27] selects the node with the highest residual energy as the leader and uses this node to transmit aggregated data from the remote sensor node to sink.
The Extended Lifetime of Cluster Head (ELCH) [28] hierarchical routing protocol elects cluster heads based on the votes collected from the network nodes.
The Energy Efficient Cluster Formation Protocol [29] (EECFP) is based on the rotational policy among sensor nodes to become cluster head. A node with the highest energy is elected as a cluster head and is rotated after each round to maintain the balance of load among sensor nodes.
ECHERP [30] selects cluster heads by not only by current energy by also taking consideration of future residual energy of the nodes. This protocol model the network and the energy spent by the nodes as a linear system and uses Gaussian elimination algorithm to select the cluster heads of the network.

Network Model
In this section Radio Model, network characteristics, routing protocol and algorithms are discussed.

Energy Dissipation Radio Model
Radio model becomes a standard for WSN and widely referred in various researches [31][32][33] and is shown in Figure 1. This radio model consists of Electronic energy (Eelec) for running circuitry of the node and transmission power. It is proportional to the distance between sender and receiver as well as packet size. The receiving power is based on running circuitry and packet size. Free space model is employed for short distances while the multipath fading model is considered for long distance transmissions. Detailed structure is described in Figure.1.
Where ETx denotes energy spent on k-bit packet over the distance d. Eamp is amplification energy required for the message and is given by: d0 is threshold distance and is calculated as: Where receiving circuitry consumption depends on message size only and is computed as follows: EDa(k) = 5*10 -9 *k (5)

Network characteristics
i. Nodes are deployed uniformly in the square-shaped area.
ii. Nodes are static and every node knows its own and its parent location.
iii. The network is virtually divided into four areas by cutting the square region diagonally as shown in Figure 2(a) and 2(b).
At each level of the tree, all nodes are assumed to work parallel and isolated from other nodes.

Proposed (Virtual 4-way Full Binary Tree) structure
Proposed technique adopts binary tree structure due to following facts i. Binary trees are considered as one of the most efficient data structure for organizing and accessing the data. The proposed technique incorporates the arrangement of nodes in the form of the full binary tree that is formed from four sides. In this structure, BS is assumed to be in the center of the region which is acting as a common root node for the trees from all four sides.
ii. This structure will enhance the lifetime of the network and it will take very limited time to send the data to BS. Moreover, by extending the network to one more level, it will lead to increase the data transmission hop count by one unit only and additional 2 n+1 nodes will be added to the network.
iii. This approach is best suited for the applications where sensor nodes are deployed manually with human intervention e.g. installing sensor nodes in a chamber before starting a nuclear chain reaction.
According to the area of each node and structure of the full binary tree, sensor nodes are divided into the following categories: i. Root node: Node in the center of the field act as BS.
ii. Parent node: Node at the topmost position in the hierarchy iii. Same generation: Nodes which are at the same level as that of the tree (equal hop count from BS).
iv. Leaf Nodes: Nodes at last level of the structure with no descendants.
v. Same area nodes: Virtual area of the network represented in form of the tree.
Since all sensor nodes are considered to be static, acquainted with location and area id at the time of their deployment. Therefore, each node possesses information about its concerned communicating nodes (sender as well as the receiver from them). However, to classify them as per the above-stated categories the following calculation has been used: i. Root node: Node with Id=1 act as a parent from trees of 4 sides.
ii. Parent node: All nodes of the network with id =⌊node_id/2⌋, where node_id is any node from the network.
iii. Same generation: Children of a parent are said to be siblings and all nodes at the same height of tree are said to be of the same generation.
iv. Leaf Nodes: Nodes for which id*2 is not available in the tree.
v. Same area nodes: Nodes which are assigned same area id.

Setup of the network
The nodes are uniformly distributed over the square shaped region. Node distribution starts from the root node (located in the center of the region) to outwards in each virtual region. Therefore, the nodes are approximately equidistant from each other and uniformly dispersed in the field of interest.
The proposed structure is illustrated in Figure 2 where every full binary tree is depicted by a different color.  Less density of the nodes in the center and greater towards the extreme corners is used because the center part of the network contains BS and when we move away from the reporting place, more nodes are deployed to monitor the area effectively.
There is also some overlapping between two different trees towards the corner sides, which is retained to maintain generality for deploying the nodes in the region of interest.
Before starting the exchange of the data, each node should be provided with information about its parent; which will help in receiving the transmitted data from the node. By virtue of this setup, a connection will be established between the sender and receiver which will make the proposal a connection-oriented protocol.  The duration of the network setup, in which nodes are arranging themselves in the form a full binary tree. This phase of the protocol follows the routines defined in the next section for setting up the network. It starts the process by Network flow (total, area) process where 'total' numbers of nodes are to be deployed in 'area'. Each node will be following full binary tree hierarchy and will transmit data to its parent node only.

Working of the proposed scheme
ii.

Active Phase
This phase comes into action when network operation is started i.e. moment from which nodes start sensing the data and remains into the picture until the last node is in working state. This phase is further divided into two cycles.
a. With stable period: The time duration from the start of the active phase till all nodes is in the working state.

b.
With Unstable period: The moment from which the first node dies till the last node is in the operational state. Few adjustments are also required in this period. Whenever a node dies, procedure TREE-DELETE(T, id) will discard the node (with identity 'id' ) from the network. This process will maintain the routing structure by connecting the dead node's sub-tree with the appropriate node. iii.
Dying Phase This phase is considered when results are not assumed to be reliable. Reliability is best when results are obtained from all nodes of the network and are going to deteriorate as each node dies. When 90% of the nodes are dead then the network is assumed to be in dying state.

Algorithms
This section defines three procedures (starting with Network flow) which give step by step working of the proposed technique.
Network flow (total, area) Total depicts number of nodes deployed in the field having area of X*Y meter 2 i. Call Network Setup (total, area) ii. Set round=1, dead=0 iii. While (dead!=(total-1)) End if

Simulation Results
The objective of the simulation is to obtain the whole atlas of specialization chains (graphs). The simulations are performed as per the parameter and algorithms are given in Section 4.
The performance of the proposed protocol is evaluated in terms of average energy remaining in the network, network stable lifetime, and network lifetime, PDR and EDR.
The performance of the proposed protocol is compared with existing protocols. Simulations are performed in MATLAB by varying the size as well as the number of the sensor nodes in the field. The BS is at the center location of the field. These simulations are performed on following six different scenarios (1) WSN#1: 121 nodes deployed in 100*100 area.
(4) WSN#4: 249 nodes deployed in 100*100 area. The network parameters used for simulations are described in Table 1 below: Energy required by the network for one round=0.0193584 (WSN#1)

Network lifetime
Stable lifetime is described when all nodes of the network are in working state and doing their task. When the network is stable then obtained results are reliable. If any node dies, then results are not 100% reliable as data of few nodes will be missing. Furthermore, exhaustion of energy of every node will degrade the quality of the results. The result comparison of proposed scheme with six different scenarios for the first node died (FND), half nodes died (HND) and the Last node died (LND) is depicted in Figure 5.

Lifetime comparison under different scenarios
A big variation is observed for the proposed scheme under different deployed scenarios (WSN#1 to WSN#6). Figure 6(a) displays the lifetime of the network in terms of round number and dead node number (for WSN#1 to WSN#3). Similarly, Figure 6(b) describes the network lifetime for WSN#4 to WSN#6. This variation mainly occurs due to the distance between transmitter and receiver. Transmission cost of a message is governed by distance factor (rest factors are constant). Therefore, deploying X number of nodes in the smaller region will provide more lifespan of the network as compared to deploying them in the larger area. This variation can be observed in Figure 6. By the addition of more nodes to the network of the same size, will boost the stability and total lifetime of the network because more nodes in the network will make the network denser, henceforth, it will reduce the communication distance among them. As discussed in Section 4.1, the transmission energy is directly proportional to the square of the distance. Therefore, the lesser distance will lead to consumption of less power, which in turn enhance the network lifetime. The difference in deploying 121 vs. 249 nodes in the area of 100*100 m 2 is depicted in Figure 7.

Comparison with existing techniques
The comparison of the proposed protocol with existing protocols is illustrated in Figure 8 which represents the stable and network lifetime First Node Died (FND) and the Last Node Died (LND) respectively. Results of the LEACH, SEP, HCR, EAERP are referred from the existing work [32], when BS is located in the center of the network. The proposed scheme achieves better stable as well as network lifetime compared to other techniques.

Comparison under different network structure
Proposed Full binary tree structure is compared with Minimum Spanning Tree (MST) and Cluster-based scheme. In each technique, nodes are placed in the same position but connected via different topology. Figure 9 illustrates two different network structures.  MST is a flat based routing technique in which data is collected and forwarded in a linear fashion from node to node with minimum cost. MST is having the advantage of maximizing network lifetime on the cost of delay in delivering packets to BS. It also suffers from reliability because the failure of any node will cause loss of data of all its descendant nodes.
On the other hand, cluster-based routing is one of the most popular techniques in which a node is selected as the cluster head and is responsible for gathering and transmitting the aggregated data of the zone to the BS. Role of the cluster head is cyclic which is rotated after a certain number of round(s).

Energy Depletion Rate (EDR)
Energy depletion rate is defined as the reduction of network energy in each successive round. EDR represents how the network is fading with time after completion of each round. Figure 11 illustrates EDR for proposed six different scenarios where a point represents the energy of the network in that round.

Figure 11. Energy Depletion Rate (EDR)
Energy of the network is reduced by some factor after each round of operation. This factor is the network operational cost/round and is primarily based on the transmission power. Figure  11 illustrates EDR for different scenarios where WSN#6 represents highest EDR i.e. highest energy consumed in each round of operation. This is due to deploying fewer nodes in the large area. The variation in EDR can be considered due to the difference in the deployment area and the number of nodes. More the node density less will be EDR and vice versa.

Packet Delivery Ratio (PDR)
Packet Delivery Ratio (PDR) is the measure of the number of packets that are successfully delivered to BS. PDR represents the efficiency of the system in terms of the number of packets that are successfully delivered to the BS. PDR for proposed 6 different scenarios is shown in Figure 12 where a point represents the number of packets lost in that round.

Figure 12. Packet Delivery Ratio
Packet Delivery Ratio is computed as the number of packets lost in each round. A node failure will result in incrementing the dropped packet count by one for the next successive rounds. Proposed scenarios are having a lot of variation in PDR. Increasing number of nodes in the same area will result in better PDR. While increasing the network area and keeping fix number of nodes will reduce PDR. This is easily observed from Figure 12 where WSN#4 depicts better packet delivery while WSN#3 represents worst case among the proposed scenarios. This variation is due to the difference in density of the nodes in the deployed area i.e. higher the density of nodes, better the PDR of the protocol and vice-versa.

Conclusion
The proposed scheme is successful in achieving faster transmission to BS along with good lifetime with respect to the existing schemes. This is due to uniform distribution of all nodes in the network which are connected as a full binary tree. It gives an advantage of nodes being well dispersed and properly connected. Every node knows the position of its parent for data forwarding; which results in no overhead of maintaining routing tables. The most important advantage of the proposed scheme is that no node is too frequently used in routing path. This prevents faster draining out, which leads to a better stable lifetime of the network.
Proposed protocol requires parallel operation of all sensor nodes which are at the same level of the tree. Thus, ardent care should be taken so as to avoid data collision. As the proposed scheme follows a tree structure, so any node failure will result in failure of all its descendants.