Computing Time Series Data During Index Based De-Duplication of Industrial IoT Data in Cloud Environment

As a result of the development in Industry 4.0, the data generated within the Industries are increasing rapidly every day to attain the innovative environment within the industry through maximal asset utilization. Meanwhile, the redundancy rate in the server is also increasing, which has an impact on the storage as well as in the analysis of data. Most existing de-duplication techniques partition the data with respect to memory. However if the time period is considered for partition, time-series analysis would be achieved during the de-duplication process. To address the above issue, the proposed work presents the Index Based De-duplication technique with Categorized Region Method for computing time-series data. The Merkle Tree with a super feature called reckoning of occurrence is combined in the proposed system to rapidly identify the existence of similar data in the distributed system with an accurate existence count that significantly helps in predicting the future drifts of the industrial environment. Finally, the proposed system also concludes with optimal transportation cost to reach the storage nodes in the cloud using MODI method. The experimental results reveal that the proposed model is efficient since it facilitates less memory and less computation overhead. The proposed technique achieves space reduction by 98%, reduces the computation overhead during analysis by 55%, and increases the efficacy of cloud storage by 60%.


Introduction
Industrial IoT (IIoT) is a framework that outlines various physical components, operational groups and procedures, network alignment and the data layouts to be used. In nature, the Internet of Things (IoT) framework is the arrangement of various components, which include receptors, regulations, initiators, virtual assistance, and zones. IIoT has been developed as a significant arena for research by resonating IoT data associated problems to be resolved especially in cloud storage environments. Numerous conflicts raised due to IoT based data accumulation in cloud storage are non-centralized execution with the combined administration of framework reserves, multi-dweller repository with solo functionality, and expandability with flexibility. Many issues raised due to huge and unstructured data dispensation at various stages like data depiction, data accumulation and analysis [1].The IIoT applications like robotics, supply chain, monitoring and Industries related to automotive, manufacturing, retail, aerospace are currently evolving IIoT with advanced features. According to the Meticulous Research organization, the IIoT market is expected to rise at a Component Annual Growth Rate (CAGR) of 16.7% during 2022-2027. Which is also estimated to attain $263.4 billion by 2027 [2]. The IoT sensors embedded in the various industrial environments generate monitored data sequentially upon the prompt of extraneous action. On the other hand, data produced by several sensor nodes must to be grouped, accumulated, examined and envisioned to attain The IoT sensor network's model move towards emergent technologies like edge, fog and cloud computing which undergoes high complexity in data dispensation, data fusion and sensor data analytics [3]. One of the most challenging problems in the IIoT environment is configuring the fog system in optimum way as industrial strategies are varied by means of provision and demand [4]. In manufacturing industries, a massive amount of redundant information is delivered to the information appraisal zone. The allocation of abundant space for this redundant data is not sufficient. It augments the repository capability and reduces the efficacy of the manufacturing system. So, the data de-duplication schematic is needed where the unused or same information is obtained and removed. Besides, intelligent decision-making models contribute to handling things such as assistants by means of a pattern-based decision-making system which is recognized by providing and incorporating the whole firm data [5]. Industries need time series data to predict the scenarios held up in the industrial environment. However during the de-duplication process redundant data will be removed, if the amount of redundant data is measured prior to elimination with respect to timestamp time series data can be computed. Accumulating the core facts in IIoT equipment regionally is not advisable when the resultant equipment power is concerned. This also involves the strict restriction of the garage area. Therefore, the equipment is not to be relied upon and prone to a large number of dangers as a result of framework provision in distant and neglected areas. Hence, the observed industrial data is stored in a cloud for high scalability and flexibility. To store massive amount of IIoT data in cloud, deduplication technique with optimal path is required.
To overcome the limitations of existing de-duplication schemes towards IIoT, the proposed system creates the newest approach to achieve deduplication with periodic monitored data. The contribution of the research work are as follows  The partitioning of data is carried out based on time interval to produce time series data which reduces computation overhead during the decision making process.


A secure index based de-duplication system with reckoning of occurrence is proposed. It reduces more space occupied by redundant by storing a unique value of monitored sensor data with existence count.  An optimal transportation cost to store industrial monitored data is determined using Modified Distribution Method to improve the efficacy of cloud storage The paper is compiled in such a way that section 1 defines the latest parameters in IIoT. The subsequent section 2 outlines the related research work which has been carried out on IIoT and existing de-duplication scheme. The section 3 consists of the framework and the proposed architecture. The experimental set-up and performance evaluation for the proposed system is determined in section 4 and 5. Lastly, at section 6 the conclusion of the paper is provided.

Table 1. Summary of Related work
Reducing storage space allocated for industrial data is one of the main challenges in this era. If any intelligent actions were performed concurrently during the elimination of redundant data will be more advantageous to the industrial environment. Inspired to move in this direction, this research work proposes an index based de-duplication system with reckoning of occurrence feature to produce time series data through that the industrial environment can be profited. This section discusses the evolution of IIoT from WSN to fog with the importance of de-duplication scheme and also summarizes the various approaches involved in the de-duplication system. The IoT has produced a great impact in the modern world due to the high contribution of Wireless Sensor Network (WSN). Fog computing replaces computation carried out by WSN as there is

Authors Proposed Scheme Method Limitations
Zheng et. al [10] Certificateless proxy reencryption based data deduplication scheme Improves the detection mechanism at decryption end to exactly identify the redundant files

Decrease in network lifetime
Xia et. al [11] Pipelined and Parallelized data de-duplication scheme During the de-duplication practice the chunks were created through a pipelined process.
Less time consuming but elongated processes were carried out to chunk the data. Fu et. al [12] Scalable inline distributed de-duplication scheme This scheme facilitates intra-node deduplication with two-tiered routing decision by developing application awareness, data similarity and range.

Xia et. al [13]
Duplicate-Adjacency based Resemblance Detection The enhanced resemblance detection reduces redundancy rate with less overhead Major problems have been raised during the segmentation of data.

Yan et. al [14]
De-duplication scheme on encrypted data Proxy re-encryption is represented to carry out the de-duplication and data deletion process.
High security for big data with high computation cost. Yu et. al [15] Privacy aware deduplication protocols The zero-knowledge de-duplication framework prevents attackers from healing the information gaining the existence status.
Two de-duplication protocols achieve two-sided privacy with increase in communication overhead. Ni et. al [16] Fog-assisted mobile crowdsensing framework This framework empowers the fog nodes in the IoT environment to sense and remove duplicate data through a de-duplication system embedded in it.
It provides resistance across brute-force and duplicate-replay attacks with less de-duplication ratio.

Tian et. al [17]
Randomized de-duplication scheme The client-side de-duplication system guarantee the confidentiality of outsourced data Even if it prevents collusive authentication attacks and offline attacks raised by the eternal hackers it fails to achieve optimality in the data storage arena. Jiang et. al [18] Secure data de-duplication scheme It provisions both file level and block level data de-duplication This scheme achieves consistency and provides mutual agreement with dynamic ownership management Gao et. al [19] De-duplication scheme based on threshold dynamic adjustment This scheme determines the sensitivity of various data with privacy score during uploading of data into the storage system This scheme supports de-duplication with lower privacy and not for higher extent. Sharma et. al [20] Secure de-duplication scheme for fog assisted nodes.
ECC based hybrid multiplier and Multi-Objective based whale Optimization algorithm ensures clustering of fog nodes.
Achieves de-duplication for IIoT data with less computation overhead but the decision making process with deduplicate data is quite complex. Fu et. al [21] Secure data storage and retrieval Scheme RF Tree , AVL tree considered for efficient storage and indexing It helps in optimizing the storage space and efficient searching mechanism but time complexity is more in decision making.

Ellapan et. al [22] Dynamic
Chunking algorithm Window size can be adjusted with dynamic prime algorithm for de-duplication process Memory is the only factor considered during chunking the data.
confined energy usage at sensor nodes. Fog brings storage resources and computation nearer to users. The evolution of Industry 4.0 has been facilitated due to adoption of Cyber Physical System (CSP), IoT, cloud and Artificial Intelligence. Furthermore, many industrial applications require a decision making model [6], [7]. Internet users have been raised dramatically in recent years due to provisioning of the Internet all over the world. Many people were invading online through many applications. Hanging with social media becomes part of their daily life. Besides, due to this COVID pandemic, many jobs, educational systems ere captivating the advantages of technology to complete their work online even during these tough times. The tremendous growth in the usage of mobile and web applications leads to exponential increase in the data across the globe. Hence, storage space is essential for storing the data has become the most important concern [8]. Provisioning of storage optimization techniques became a vital constraint to huge storage capacities like cloud storage. De-duplication is a storage optimization technique which evades accumulating identical replicas of data [9]. The main task to be performed in de-duplication is partitioning of data [11], [18], [22]. As deduplication is carried out with protective data various security measures are considered [10], [12][13][14][15], [19]. While decrypting the data for the deduplication process many attackers may intrude to steal the data which is addressed with various potentials in [16], [17]. The secure storage mechanism for IIoT (SDSSIIoT) and de-duplication carried with 2FBO 2 for IIoT data (FaCIIoT) is stated in [21], [20]. Many existing deduplication work considers memory and security as the prime factor which is depicted in Table.1. However, with this, the decision making process cannot be achieved rapidly. To address these challenges, the proposed system implements index based de-duplication with a super feature called reckoning of occurrence reckoning for each time interval.

Index Based De-duplication Using Merkle Tree with Reckoning of Occurrence
The architecture diagram for the proposed model for the Industrial IoT cloud environment is shown in Fig.1 and discussed in detail in this section. The proposed scheme for the Industry 4.0 consists of three components which are Partitioning, De-duplication, and Optimal path determination to store the data in a cloud environment. In partitioning, the proposed system employs the Categorized Region Method, which forms the regions in the extracted sensor values prior to the execution of de-duplication. In De-duplication, the redundant data present in the IIoT data is reduced using index-based de-duplication with Merkle Tree. In this paper, a brief description of the three components and the design applications are mentioned.

Categorizing Region Method
Due to the emergence of the Industry 4.0, various processes are enhanced, resulting in enabling the data analysts to address the problems such as data analysis and predictive maintenance. The performance of predictive analytics with the non-redundant sensor data is an essential thing as the data is collected and organized periodically. For example, In a gas industry, if any gas leakage in the industry occurs the industrial environment may get affected but none can predict accurately how worst it affects humans as well as the industrial environment in that particular period. Hence, this Categorized Region Method (CRM) fragments the collected sensor values into several parts with the time interval and boundary value ( ) using the chi-square test. Firstly, the data collected is partitioned with time period which are regions and in each region the sub regions were formed with boundary value and chi-square test which is presented in Algorithm.1. Using this chi-square test, the chunks are made in the sensor values, and the values in each area are sorted prior to the formation of the region. The variation between the initial value and the last value of i th region ( ) depends upon the presence of redundant values in the region. When the existence of the redundant value is high in the data collected, then the value in the regions is defined with transitive closure i.e. ( ( )) ≈ ( ( )). When the records are sorted using the attributes containing the redundant values, the duplicate records tend to be located in the neighbor records. When the last value in the i th region exists in i+1 th region, then the region i is enhanced with the inclusion of similar values that exist in i+1 th region. Then, the following criteria ( +1 ) < ( ) is achieved and notation are listed in region. In the categorized region phase, the time attribute is used for generating the regions. When the size of the data collected in a particular time period is large, the regions are subdivided with the boundary value. The boundary value ( ) is calculated by, The regions are categorized into subregions with the help of . In each region, the critical value is calculated. As a result, the irrelevant and noisy data is removed from the observed values to ensure accurate results. The chi-square test determines the region in the proposed model as 2 provides a measure of deviation between observed and expected value. The number of observed elements in the categorized region for i=1, 2, 3 …y is encountered. Based on the null hypothesis, the expected number of elements to be found in the categorized region for i=1, 2, 3 …y is calculated. The null hypothesis is a default hypothesis that measures the quantity to be zero or null. Where the quantity is different in two situations, the chi-square statistic is given as (2) The expected value ( ) required for chi-square calculation is determined using ( ) and ( ) maximum value, which lies in the observed data. ( ) value lies between ( ) and ( ). This chi-square test yields excellent results with large database n, and hence it is selected for the IIoT data. The CRM practices a succession of regions to derive near the location of the border values aggressively. At the same time, the approach uses different size ambient procedures in 2 stages: (1) an enhancement stage to procure a number of capable replicas along with a minimum number of inexpensive span computations and (2) a retrenchment stage to determine the border values inside the conclusive size from the initial stage. Each region is again divided into k sub_regions with boundary value 4.

Algorithm 1 Partitioning of regions using CRM
Perform chi-square test under each sub-region 5.
Compute the distance between first and last records in each sub_region 6.
If the last value of sub_region n and the initial value of sub_region n+1 is similar, then a.
Enhance the sub_region n with inclusion of similar values from sub_region n+1 b.
Retrench the sub_region n+1 with no. of values passed to sub_region n else a.
Repeat the step from 4 to 7 until all the regions are sorted.

End
The algorithm for partitioning the data in a regional-wise is done using CRM method. The resultant regions from the CRM is passed to the Index based de-duplication to realize the main goal of the proposed work.

Index based data De-duplication for IIoT
This is a methodology to eliminate redundant copies of data existing within the database. It is also termed as single-instance data storage that improves the efficient utilization of memory and CPU. It also reduces the transmission bytes during the transfer of data in the network. In most cases, the sensor node generates a large number of redundant values. Such redundant data needs to be eliminated. In a traditional de-duplication system, the author provides various techniques to identify and remove the redundant data, but none has proposed a scheme to count the duplicate existence through which time series data can be produced. With the count of duplicate existences in each time period, the activities in industries can be predicted as like time-series analysis. On considering the above issue, the proposed system provides the time series analytical data for industrial environment while de-duplication is carried for the observed sensor values in a cloud storage. To perform the proposed work, the system uses Merkle Tree with the reckoning of occurrence feature in it. Once the regions and sub regions are partitioned based on the time period and boundary value, the observed sensor values in the region undergoes indexed-based de-duplication. The indexed-based de-duplication is carried out by Merkle Tree, which is used to store unique values with dynamic indexing on multiple levels. It generates a hash value for each data that resides in the leaf node and stores them in the non-leaf node called the internal node. Along with the hash value in the internal node, the proposed work employs the feature called reckoning of occurrence to measure the multiple existences of an each observed value in the Industrial environment. Due to this additional feature, the proposed de-duplication contains unique values with a count of number of times the data hits the tree structure for leaf node appearance that leads to effective indexing.
Merkle Hash Tree is a tree data structure containing hash-based values. It also simplifies the hash inventory. It is a non-linear data structure i.e. tree data structure where every leaf point holds data. Every non-leaf point contains hashes of its children, i.e., non-leaf nodes store hashes of data. The non-leaf nodes are represented as and . Here, the left subtree contains left child nodes and is the right subtree contains right child nodes. The non-leaf nodes in the tree are allocated to store the hash value of each data. The leaf nodes of both left and right nodes in the subtrees range from 1 to n-1. For instance, the origin of the tree i.e. Merkle root consists of leaves then its key varies from 0 to − 1. Merkle Tree can be mathematically represented as. MT= RT ( ( ) * ( )) The Proposed de-duplication technique is constructed as follows

1) Generation of Internal node and leaf node
The bottom-most layer of the Merkle tree contains a leaf node with data. The numerous data collected from the Industrial environment is accumulated in the leaf node of the Merkle tree. When the leaf nodule is identified, a hash value is generated and stored in the internal node, which acts as a layer above the leaf node. SHA-3 512 is used in the hash value generation. The leaf node containing data is secured with 512 addressing mode during the hash value generation. Every observed value is allotted with 512-bit hash value and the recurrence of data is verified with the hash value. Correspondingly, the hashing advances at each stage, resulting in the attainment of elevated stages till the origin node called the 'Merkle root' is reached. As a result of the connection of hashes resembling a tree, the system comprises the data of all the operation hashes which are found in the node. It produces a single-level hash parameter which facilitates justifying all the parameters existing in the node.

2) Reckoning of Occurrence
To predict the future inclinations of the industrial environment as like time series analysis, the occurrence of each value need to counted with periodic intervals. With this maxim, the reckoning feature is employed along with Merkle Tree as a part of the proposed work, which is presented in Fig. 2. During the generation of the hash value for the leaf node, the existence factor for the leaf node is initialized as one for the first incidence. When the same data hits the tree for space, the hash value generated matches with the existing leaf node, results in the increment of existence count before the de-duplication process is carried out. When several happenings of the same value persist in a particular time period, the Merkle tree considers only one instance, and incremented reckoned occurrence value is stored in the internal node along with the hash value of the leaf node.

3) Updation of Path between leaf node and root
Subsequent to the penetration of the value, a leaf node, the updation of the path, and the number of incidences of each observed value need to be reorganized periodically that is explained in Algorithm2. Thus the proposed system consistently maintains the path from Leaf nodes and to root node along with the occurrence factor. Since the construction of Merkle tree begins from the bottom-most layer of leaf nodes, the duplicate existence is also updated with a hash value of data as a separate field in the internal nodes. When indexing begins from the root node, along with the path to reach the data node location, the several existences of such data during the time period also be provided. Due to several hit to the tree by several times, the path established to find the nodes is accessed frequently which helps in improving the search mechanism.

Repeat step 4 until Merkle root is found 6. End
The leaf nodes of the Merkle tree are filled with observed values of IIoT data. The leaf node with hash value of its own is determined and stored in the internal node. If the hash value of new data matches with existing data, then the existence of such value is updated with one step increment. Meanwhile, the redundant data is restricted to enter into the Merkle tree and the hashes of the internal node are determined that reaches the single top Merkle root. The indexing starts from Merkle root to leaf node with an amount of existence of such value for the period of time.

Determination of Optimal path in Cloud Environment
Typically in cloud storage, the clustered storage nodes lie in the same zone. Each cluster has one or more nodes that compute resources but does not meet the Industrial requirement with respect to the availability of storage space. To address the above issue, the proposed system uses Linear Programming Problem (LPP) model through which the optimal cost to reach the space availed storage node is calculated. Each node that exists in the cluster is configured with the availability of storage space. Then the space availability at the storage node is compared with the industry demand. Using Vogel's approximation and modified distribution method, the optimal transportation cost is determined, which leads to storing of a huge amount of IIoT with optimal cost in the cloud environment. Initially, resource availability should be confirmed as a balanced or unbalanced problem. If (≥ 0) represents the number of nodes with the availability of storage space from the i th source to the j th destination where the industrial demands storage space, the equivalent Linear Programming Problem (LPP) is given as If the storage space met the demands of the industry, then it is considered as a balanced transportation problem which is represented as (5) If the availability of storage space does not meet the industry demand, it will result in an unbalanced transportation problem. In such cases, the availability of storage space should be increased to provide resources according to the demands of the industry. The unbalanced problem can be represented as follows For a feasible solution to the existing problem, the total capacity must be equal to the total requirement. If the total availability of space equals the total demand, then it is a balanced case. As the cloud environment provides abundant storage space, the proposed system assumes that the space available in the storage nodes is always greater or equal to the demand of the customer or industry. With Vogel's approximation method, the industry demand is identified, and the storage node which has excess or equal space is allocated. Similarly, several iterations are carried out to fulfill all the demands of the industry by providing appropriate storage space. Finally, with a modified distribution method, the optimal transportation cost is determined, which results in the storage of a huge volume of industrial data with optimal transportation cost. The representation of attributes used for the estimation of transportation cost is mentioned in Table. 3. Through this linear programming problem, the minimal cost of distributing the space availed in the storage nodes to the IIoT environment is calculated. If ∑ = ∑ = , the balanced state exists between the available storage nodes and the industrial demand. Then the diff (min ( ), next_min ( )) is intended with an assignment of penalty. Similarly, the transportation cost for each cell is evaluated. Finally, the degeneracy is estimated from the final solutions. If the number of final solutions contains a value less than + − 1 , then there exists a degeneracy leading to the completion of the optimal cost determination process. Otherwise, the non-degenerate feasible solution is redefined using the modified distribution method with the assignment of ∈ (≈ 0) in a suitable independent position. Finally, the examined for the cell doesn't contain either ∑ ∑ From the above estimation, it is concluded that if > 0, then the determined cost is optimal and unique. If = 0, the cost is optimal, which facilitates an alternative optimal solution. If < 0, the solution is not optimal, and hence the iteration of the calculation process repeats to attain the optimal state. With the above calculation, the proposed system determines the optimal transportation cost to store the non-redundant data in the cloud environment.

Security Analysis at proposed system
Each node in factory is assigned with unique identity, shared secret key using AES algorithm in predeployment phase. The nodes in the Industrial environment generates shared session key to guarantee the data transmission with edge server and other nodes. When the edge server receives data packages it decrypts the package before get processed. During the outsourcing of other information to the cloud it is again encrypted with AES algorithm to prevent from data leakage. The cloud accumulates a huge volume of data produced in the Industrial environment. On the other hand, the data accumulated in the structure of ciphertext, which is not deciphered devoid of the data users' undisclosed inputs [21]. Additionally, the Merkle tree contains hashes of hashes for the data stored in the leaf node through which the data will not be corrupted. Only if the chain is followed continuously, the real data may be collected at the end. Failure results in the processing of tampered data.

Performance Evaluation
To assess the performance of proposed scheme in a cloud environment, experimental setup has been arranged as mentioned

Experimental Setup
In this segment, the proposed system performance are evaluated based on the real experiments. The proposed system builds the prototype to measure the distance of the liquid level in the containers as well the temperature of the factory environment. In this experiment, sensor producing same kind of output at various stages is considered. The proposed system is examined with various factors like space reduction percentage, efficient data retrieval, network lifetime, time series analysis, average latency, data transmission and storage space In the experiment, five ultrasonic sensor and five temperature sensor nodes are deployed in the factory to build IIoT environment. The sensors nodes are connected via wireless mode of communication to observe the environment. The sensor nodes employed for the experiment were made communicated with each other through Zigbee protocol. Zigbee (IEEE 802.15.4) is a low-power, low-data rate wireless network and enables smart objects to work securely on any network. Laptop of following configuration (AMD Ryzen 7 4700U Processor, 2.6 GHz Intel Core processor, Windows OS 10 and RAM of 8 GB) is employed to communicate with sensor nodes. The data transfer between edge server and cloud server carries with the help of Internet. The edge server (Machine -I) receives the data then transmits to a cloud. Machine-II is employed as cloud server in the proposed prototype. The distance value and temperature value measured by the sensor nodes are collected through Arduino module which functions at the edge server. Finally the recorded value is outsourced to the cloud server for intelligence actions. The reading are recorded every ten seconds and fusion of recorded values are passed to cloud every hour. Since, ten sensor nodes are deployed over factory, ten reading were recorded every ten seconds. To compute time series data from collected IIoT data, Index based deduplication using Merkle tree is proposed in this research work.
To carry out lingering proposed work, cloud environment is built with Eucalyptus. Eucalyptus is a private cloud contains Cloud Controller to execute administrative process and Walrus, the storage controller to accumulate an enormous amount of IIoT data produced by the IoT devices embedded in the industrial environment. The partitioning of regions in the collected sensor values and de-duplication using the Merkle tree is implemented using python. Finally, the optimal transportation cost to store data in cloud is determined with the available node information at the storage controller.

Space Reduction due to de-duplication
Improving the efficiency of cloud storage by de-duplication is one of the main intentions of the proposed system. To measure the performance of  The de-duplication process carried out with the partition of regions and sub-regions is depicted in Table.5. As the results attained towards deduplication system and space reduction percentage presented in Fig. 3 and Fig. 4, proves that 10% of data under each node sensor values is unique and remaining values are identified as redundant data and the occurrence count is measured to provide time series data.

Efficient Data Retrieval
The proposed work provides a simple approach to de-duplicate the redundant IIoT data and it can be retrieved from the cloud server with minimal time. The proposed system performance is measured with FaCIIoT [20] and the SDSSIIoT [21] to establish the average data retrieval time on the cloud server. In FaCIIoT, it requires more time for scanning to obtain the results from the query. The results increased linearly with an increase in the feature vectors. However SDSSIIoT gave good results in the data retrieval mechanism, many relevant data were produced along with explored data. Whereas in the proposed system, the Merkle Hash Tree gave the best performance in the data retrieval mechanism with exact data as it maintains a single instance for the each node sensor values. The proposed scheme maintains a unique structure for the indexing methods with an additional feature of reckoning the occurrence to count the existence of each value. Due to several hits made by the redundant data to acquire space in the cloud storage, paths established between root and data become recurrent. Thus, the search proportion gradually decreased, which enabled good results, as revealed in Fig. 5.

Network Lifetime
The improvement of the network lifetime using the proposed system is a demanding responsibility. In this framework, continuance is evaluated with FaCIIoT [20] and the SDSSIIoT [21]. It is evident from the plots that the proposed system attains an elevated network lifetime when compared to the previous works. The average network continuance obtained in the preceding researches was lesser than the proposed system value of 60s. In SDSSIIoT the network lifetime was not much good because the authors have focused only on the secure storage scheme for IIoT data. Even many de-duplication systems exist to address the IoT data, de-duplication carried out with indexing-based technique is the newest approach. In the FaCIIoT, the network lifetime was achieved with good value. But it failed to persist for longer durations. Attaining maximum network lifetime requires balancing of traffic overhead between sensor nodes and cloud storage. The proposed system maintains a good network lifetime for a longer duration which is shown in Fig. 6. It is due to the high availability of storage space at the cloud server and reduction of network traffic overhead by evading redundant data to reach storage. The cloud persisted in receiving data from proxy servers due to the availability of resources. Fig. 6. Network Lifetime

Computation time for decision making
Figuring time-series analysis concurrently during the de-duplication process in the cloud storage is one of the challenging tasks which is addressed by the proposed work. The performance of the proposed work is compared with existence works like FaCIIoT [20] and the SDSSIIoT [21]. Merely ten node readings were taken into consideration for evaluating this parameter. The average time taken by the proposed system to execute the decision-making task is 47 ms and the resilience for consuming less time in decision making is plotted in Fig. 7. Whereas SDSSIIoT takes 88ms and FaCIIoT takes 107ms. The proposed system consumes less than 50% of time for decision making process when compared to previous works. The proposed work used categorized regional method and Index based de-duplication methods to reduce the redundant data in spite of its quick user, which achieved excellent latency time. The latency time is the duration needed to react to the customer's needs. The mean latency is characterized by the addition of the duration to fulfil every demand put forward by the user. It is calculated based on the lesser and greater value of the duration. The lesser duration is 0, whereas the greater duration is the interval needed for assimilating the user's solitary demand. The average latency is calculated using the previous works, which are the FaCIIoT [20] and SDSSIIoT [21]. Which are considered for determining the effectiveness of the proposed system. As presented in Fig. 8, the mean latency for 5000 sensor values using the proposed system produces 1.46ms, which is lesser than the FaCIIoT, SDSSIIoT.

Data Transmission Rate in the Cloud Server
The sensor nodes connected in the factory measure the distance value and temperature value every 10s then sends the observed value to the edge server. The collected value of every one hour is passed to cloud server. The proposed focuses on determining optimal path to transmit the data in cloud server. As obtained in Fig.9, the proposed system greatly reduces the transmission rate. This can be described by the fact that the raw data observed from the sensor nodes is preprocessed before outsourcing to the cloud storage as well the optimal path in the cloud storage is identified with the storage node information available at Walrus. Therefore, with this proposed framework, the cloud needs to store only one reading instead of five readings. As the proposed system computes the time series during the deduplication of IIoT data, only less amount of data is transmitted when compared to previous works like FaCIIoT [20] and SDSSIIoT [21]. The data procured by the industries has augmented enormously due to the development of automation in Industries. Extensive comparable data in the repository server exists, which needs to be avoided. Despite the problems exist with various existence de-duplication schemes, the proposed system is designed with additional feature called reckoning of occurrence in the multilevel indexing Merkle tree. To compute time series data, the proposed system implements Categorized regional method to partition the data based on time interval which creates much impact during decision making process. Finally the optimal transportation cost to reach the storage node in the cloud also addressed. From the experimental results, the proposed system was revealed to perform in an enhanced manner when compared to the previous works in terms of space reduction percentage, search time, network lifetime, decision making process, average latency and data transmission rate. There are still many challenges in the implementation of this proposed system. Open research based on security needs to be implemented for improving this framework efficiency. In future, we have planned to concentrate on other applications like IoT healthcare services at edge environments.