A Lossless Distributed Data Compression and Aggregation Approach for Low Resources Wireless Sensors Networks

: Wireless Sensor Networks (WSN) have been as useful and beneficial as resource-constrained distributed event-based system for several scenarios.Yet, in WSN, optimization oflimited resources (energy, computing memory, bandwidth and storage) during data collection and communication process is a major challenge. Most of energy consumption (as much as 80%) for standard WSN applications lies in the radio module where receiving and sending packets are necessary to communicate between stations.This paper proposes an approach to achieve optimal sensor resources by data compression and aggregation regarding integrity of raw data.Data aggregation discarded a certain sensing data packet, which leads to low data-rate communication and low likelihood of packet collisions on the wireless medium. Data compression reduces a redundancy in aggregated data, which leads to save storage and sending only one small data stream in the bandwidthof communication.The performance of the proposed approach is qualified using experimental simulation on OMNeT++/Castalia. Theperformance metricswere evaluated in terms of Compression Ratio (CR), data Aggregation Rate (AR), Peak Signal-to-Noise Ratio (PSNR) and Mean Square Error (MSE) and Energy Consumption (EC).The obtained resultshave significantly increased the network lifetime.Moreover, the integrity (quality) of the raw data is guaranteed.


Introduction
During the last few years,the application of Wireless Sensor Networks (WSNs) hasbeen an increasing interest in unattended environments [1]. The WSN is composed of hundreds to thousands of wireless nodes. Each node has some computational power and sensing capability, and operates in an unattended mode [2].These devices are able to monitor a wide variety of ambient conditions, like temperature, pressure, luminosity, humidity, composition of the soils, human or vehicular movement,supply chain, noise levels, the presence or absence of certain kinds of objects (like those of the medical imaging field), mechanical stress levels on attached objects and so on. Due to the inaccessibility of the hostile area and also the large number of sensor nodes in the network area, it is not always possible to expect the sensor nodes to be plugged into an electrical outlet or to change their batteries frequently. Therefore, it is crucial to optimize the amount of consumed energy expended by the sensor nodes, since the consumed energy determines the lifetime of a sensor network. However, wireless communicationconsumes more energy than other activities.
The communication radius is generally greater than the range of a single node. Hence, the farther sensor that has to transmit data requires more energy and thereforethe lifetime will be more reduced. To tackle this main issue, resource optimization becomesa crucial problemto design an efficient compression and aggregation approach whichminimizes at the same time packet loss, collisions, congestion, power consumption and the amount of communication required by the sensor nodes.
The autonomous sensors are randomly deployed, thus, various sensor nodes often collect a common phenomenon,which creates redundancyin the data communicatedfrom sources node to a particular Cluster-Head(CH) or sink. It is known that leveraging the correlation between different samples of the observed data will lead to better utilization of sensor resources reserve. However, a large number of sensors periodically collect data and send them to a border node in the network. Resources saving can be archived if different sensor reading can be combined into a single super packet through compression and aggregation, which eliminates redundancy, minimizes the number of transmissions and thus savessensors resources.
Nevertheless, aggregation should only be done if the amount of energy taken to aggregate data byte and transfer is less than just transferring data without aggregation. The approach also examines the complexity of optimal data aggregation, showing that although it is a NP-hard (Nondeterministic Polynomial time) problem in general. Most of the datacompression and aggregation methodsin literatures investigated on lossy compression and addresscentric aggregation routing protocol.The proposed approach focuses on lossless data-centric compression and aggregation to obtain the approximate polynomial solution.
The rest of this article is organized as follows: section 2 presents a related work and focused issues on data compression and aggregation. Section 3 proposes a distributed lossless data compression and aggregation approach in WSNs. Section 4 presents the implementations and discusses the experimental simulation results. Section 5 concludes the paper.

Related work and focused problems
It should be mentioned that, this section reviews literature on data compression and aggregationin WSNs, different authors have implemented the possible approaches. Although there are researcheswhich depend only on one of the two methods mentioned above.
Most of the aggregation schemes presented in several literatures investigated to save sensor's energy by considering unconstrained data traffic [3][4]5]. In aggregation, the intermediate nodes can remove data redundancy received from multiple sources in order to transmit the compressed data. The compression approaches can be grouped into two main categories: lossless and lossy data compression. Lossless compression generates a statistical model of the data and maps the data to bit strings based on the generated model. Conversely, lossy compression transforms the data into a new space using appropriate basis functions. In the new space, the data information is usually concentrated on a few coefficients. Hence, compression can be achieved after quantization and entropy coding [6-7-8, 9]. The best known methods in the literatureswill be introducedin the followingsection.

Data funneling
In compression by funneling approach,local nodes transmit the reading data to a border node which aggregates the data before sending it onto the controller node.The nodes in thearea select a parent node which aggregates the data before sending it onto the base station as the authors present in [10].

Pipeline in-network compression scheme
In Pipeline approach, the data collected from sensors is buffered in the network aggregation node for a certain lapseof time. Then, the data packets are combined into one data packet by suppressing the redundancies through a pipelined compression scheme as the authors present in [11].

Hardware-Assisted data compression
Hardware-assisted approach proposed an adaptive compression architecture based on statistical data analysis, for on the fly data compression and decompression whose field of operation is the cache to sensor memory path as the authors presents in [12].

Clustering methods
In WSNs, clustering methods allow the data aggregation of sensor and improve the scalability of multi-hop wireless networks. Thisapproachdivides the network into subset partitionconsisting of nodes, called clusters.Each partition has one node serving as its Cluster-Head (CH). After the formation of clusters, the nodes transmit their sensing data to the CH for data aggregation. Various clustering protocols have been proposed in literatures [13,14]. Most of them did not consider data correlation and the assumption of ideal data aggregation, where data are perfectly correlated, such that an arbitrary number of packets within a cluster can be compressed into a single packet.

Routing models
The routing schemes which use data aggregation in literature are data-centric routing protocol and addresscentric routing protocol. In both casesthe sink sends out a query/interest for a certain data collected in which the sensor nodes that have the appropriate data then responds with the corresponding data. However, the difference of the two methods is the way to senddata from the sources to the sink [15]. In address-centric protocol, each source independently sends data along the shortest path to the sink, while in data-centric protocol, the sources send data to the sink, but routing nodes can access the content of the data packet and perform aggregation on multiple input packets [16]. Due to the advantages, in this work, data-centric protocol is consideredto be used.

Proposedmethod: distributed lossless data compression and aggregation approach
The proposed network architecture approach is to focus on a single network graph that is assumed just for a single cluster attempting to gather data from a certain number of data sources of its cluster [17,18]. Letusconsider n source nodes (N1, …,Nn) and a sink node (K). Let the network graph be G = (N, E, d) consisting of all the nodes N, with E that is composed of edges between all nodes that can communicate witheach other directly anddrepresents a distance function which maps E into a set of non-negative number.
Let us assume that the number of transmissions from any node in data aggregation node is exactly one. Each sensor Ni sends thesensing data collected to the aggregator (CH), and then the aggregated data is sent to the sink (K). Thus, the aggregation rate is a ratio between discarded packet and the total packet received in the aggregation node. The problem to be sort out is to perform compression and aggregation of the sensing data at a single point of aggregation, before the transmission of the compressed data to the sink. The flow chart of the proposed approach is shown in Figure 1.

Aggregation scheme
The strategy of the aggregation is to use a convenientpacket order in which those packet (pkt) data are sent to convey additional information to the sink.
When the aggregation node receives the sensing data from the neighbors, it explicitly discards some of the data packets, and then the ordering remained data packetsare used to transmitthe information containing the packets that have been discarded. The problemis how many packets can be discarded.
Let k be the range of possible values generated by each sensor, pthe number of packets present at the aggregation node, rthe number of discarded packets and n the range of sensornode identification, each node has a unique Identification(Id).
The strategy is to discardr packages and use the appropriate order of the remaining (p-r) packages to indicate which values (payload) were contained in the r discarded packets; this induced the number of permutations given by(p-r)!. Each of the discarded packets contains a payload that can take one of the k possible values and an Id that can be (n-p+r) value of all valid Ids except those belonging to the packets included in the super packet.
The values (payload and Id) contained within the discarded packets can be considered as symbols of (n-p+r) * kary-alphabet, resulting in (n-p+r) r * kr possible values for the discarded packets. Since, each packet has to be identified with a unique Id, moreover the packets are discarded simultaneously and randomly, then (n-p+r) r is more expressed by − + . Thus, to obtainaggregation rate, the following relation must be satisfied [10]: However, for large values of p and n, this relation becomes a NP-hard problem, which calculation largely exceeds the accuracy of a computer.The problem is critical in WSNs applications, which require a low complexity [19,20]. Moreover, the inequalityin equation(1) cannot be transformed, so that r is expressed as a function of n, pand k. Therefore, to alleviate this task, numerical's approximation methods and the Stirling's approximation ( ! ≅ ( ) √2 ) are used to calculate the optimal value of r satisfying the inequality ofthe relation (2) as follow [21,22]: ln(2π) + r + (r + 1/2) * ln(r) + (p − r + 1/2) * ln(p − r) + (n − p + 1/2) * ln(n − p) -r * ln(k) − (n − p + r + 1/2) * ln(n − p + r) -p ≥ 0 (2) Where ln(.) represents a natural logarithm.
Let's consider a running example where there are n = 8 nodes with Id's N1, N2, N3, N4, N5, N6, N7 and N8. The number of messages that arrive at an aggregation node is p = 6. By considering a black or white color of an image, each sensor generates an independent reading, which is from the set {white, black}, then k = 2. Using the above formula (equation 2) scheme allows the encoder to reject r = 2 packets. To clearly illustrate the scenario, Table 1 shows the mapping obtained from the shift cursor permutation of Algorithm 1 as shown in Figure 2 [23].  Let us assume in WSNs of n = 2 8 sensors, k = 2 4 , the discarded packets (r) as a function of the p packets in aggregation node isshown in Figure 3, as defined in equation 2. Figure 4illustrates the family function of discarded packets.

Compression scheme
The strategyof the method is to reducethe aggregated data packetto obtain the optimal compression size in WSNs.The approach is designedby four components as illustrated in Figure 5. For now, the four components which are integrated are based on the entropic coding: Arithmetic coding, Run Length Encoding, Move ToFront encoding and Burrows Wheeler Transform encoding [2,6]. These coding were chosen becauseof their particular performance in applications constrained by resources.Their implementation involvessimple instructions of additions and integer values shifts.
The components 3 and 4 in Figure 5are based on a process of redundancy identification in raw data to facilitate the compression by the component 2. Once the data has no redundancy, the component 1 compresses it to yield an unintelligible file whose size is smaller. The Algorithm 2 and the Algorithm 3 describe the implementations of components.

Implementations and simulations results
The Castalia simulator is used to extend the functionality of the Omnet++ simulator, particularly in the Wireless Channel of transmission module and the energy management module [24-25-26, 27].In the implementation simulations under integrated simulators (Omnet++/Castalia), four scenarios are envisaged as illustrated in Figure 1.
The first scenario is to compress data at the source node before transmission to the CH where it will be aggregated. The second scenario is to aggregate the data collected at the CH before performing the compression.The third scenario is to send the sensing data without compression and aggregation. The last scenario is just to send the sensing data after compression or aggregation.

Compression and aggregation
In the aggregation process,the aggregation node collects data from its cluster and then applies the aggregation in a first step.In the next step it performs the four compression components. The illustration of the processescan be shown in Figure 6 and Algorithm 4.
The compression process as illustrated in Figure 7 and Algorithm 5, consists to add a table of four components to the super packet and initialize by zero (not active component). Each node can activate at most one compression process component among the four, depending on the data received.

Performance metricsand simulation parameters
The performances of the proposed approach are measured in terms of the following metrics: -Aggregation Rate (AG) as defined in equation 3.

AG = (3)
Where r represents the number of discarded packets and p is the number of packets in aggregation node.
Where the pixel values of image are integers that range from 0 (black) to 255 (white), thus Pic represents the max pixel value 255 and MSE is a Mean Square Error.
-Mean Square Error (MSE) as defined in equation 6.
Where NM is the size of image that represents the N x M pixels, N is the number of rows, M the number of columns.
-Energy Cost (EC) as defined in energy consumption model of equation (7) [29,30]: EC = + + + + (7) Where Ecap, Ecomp, Eagg, Et and Er respectively represent the consumed energy to capture image, the consumed energy to compress sbytes, the consumed energy to aggregate s bytes (equation 8), the consumed energy to sends bytes at a distance d, and the consumed energy to receive s bytes (equation 9).
To sort out the different performance metrics, the simulation setup parameters are definedin Table 2.  (Figure 8 (a)) 320x320 pixels, 325 kilobytes Image size (Figure 8 (b) The test images from Kaggle dataset [28]are shown in Figure 8. Within the 10 nodes of the network, the sources nodes are node 1, node 2, node 3, node 4 (CH), and node 5 as shown in the simulation environment in Figure 9. The simulation time limit to receive the complete image is 1860 seconds.
(a) (b) Figure 8.Overview of the test images

Results and Analysis
The images of Figure 8 were used for the experiments, each camera sensor capture image that the pixel values are integers from 0 (black) to 255 (white). Thus,k = 2 8 is the possible gray intensity values. Figure 10 presents the data Aggregation Rate (AR) of the proposed approach with lowcomplexity. It can be observed that the gain increases exponentially when the number of packets to be aggravated grows. Which induces a logical reduction of collisions, congestion, data-rate communication and produce various trade-offs among some network related performance metrics such as compression rate, energy, latency, accuracy, faulttolerance and security. The proposed compression with aggregation approach was compared with 2D-DCT (two-Dimensional Discrete Cosine Transform) presented in [31]. Figure 11 shown the comparative CR. It shows that 2D-DCT presents a better CR than the proposed approach. It can be justified by the fact that 2D-DCT is a lossy compression approach, whereas the proposed approach is a lossless compression.In contrast, the lossless propose approach presents the best PSNR in Figure  12 and the best MSE in Figure 13 compared to 2D-DCT.
The overall energy consumption of the proposed approach, used for the operation of capture, communication, compression and aggregation processes of each sensor node is shown in Figure 14 and Figure 15. The results reveal that after 1860 seconds of simulation, the first image is received by the sink. Figure 16 and Figure 17present the comparative network lifetime of remaining energy between the proposed method and 2D-DCT. Thus, generally speaking, the proposed approach of compression with aggregation has the best remaining energy.
In light of these encouraging results, the performance characteristics of the proposed approach are satisfactory.

Conclusion
This paper has described an approach of lossless data aggregation and distributed compression in low resources network platformsin WSNs. Clustering, aggregation and compressionwere used to provide an architectural framework for exploiting data correlation. The results of the proposed approach were evaluated qualitatively and quantitatively, using performance metrics such as the Compression Ratio (CR), data Aggregation Rate (AR), Peak Signal-to-Noise Ratio (PSNR) and Mean Square Error (MSE) and Energy Cost (EC). The simulation results show that the proposed approach is better than the existing methods. The advantagescan be used to handle some problems when the number of source nodes increases and when the source nodes are located relatively close to each other and far from the sink. The simulation results, though, also seems that the compression and aggregation latency could be non-negligible and should be taken into consideration during the design process.