BTHAAT: A Blockchain-based Traceable Hybrid-Anonymity Authentication Technique for Data Middle Platform in Industry 4.0

doi:10.21203/rs.3.rs-3098732/v1

Download PDF

Research Article

BTHAAT: A Blockchain-based Traceable Hybrid-Anonymity Authentication Technique for Data Middle Platform in Industry 4.0

https://doi.org/10.21203/rs.3.rs-3098732/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

There is currently a growing concern about the preservation of individual privacy, particularly in the context of the increasing usage of Big Data technologies, such as the Data Middle Platform. As such, the question of how to enhance privacy and security has become a primary concern in today's world of Industry 4.0. In this article, we review related literature on the security and privacy challenges of the Data Middle Platform, and explore the use of Blockchain and zero-knowledge proof theories for authentication. We propose an algorithm that combines k-anonymity, homomorphic encryption, blockchain, and zero-knowledge proof for traceable authentication technology, and demonstrate its feasibility in the laboratory. Our experimental results verify the efficacy of the algorithm. As the algorithm is intended for industrial use, we compare and discuss it with existing authentication techniques in the penultimate section. Finally, we conclude our work and highlight directions for future research.

Blockchain

Privacy Protection

Authentication

Audit

Zero Knowledge Proof

The Fourth Industrial Revolution (IR 4.0) is changing the business model, the production method, and our lifestyle. The aim of Industry 4.0 is to facilitate self-governing decision-making protocols, monitor assets and processes in real-time, and establish interconnected value-generating networks via early stakeholder involvement and vertical and horizontal integration. This objective will be achieved through the employment of advanced information technologies, including the Internet of Things, high-speed mobile communication technology, and Big Data [1]. These new technologies have become the dissemination medium that facilitates the data flow from one device or system to another. It enhances the communication and interaction between different devices or systems, but also creates a trust crisis [2]. That is why authentication is becoming increasingly crucial today.

1.1 Authentication

Authentication is the procedure of validating the identity of an individual or object by confirming if they are who or what they assert to be. Authentication is not a new concept and is defined as a mechanism that grants users access control to systems by comparing their credentials with those stored in a database of authorized users or on a data authentication server. Access is granted to users only if their credentials match those on record. This process of authentication helps ensure the security of the system and prevents any potential breach of privacy [3].

Normally, there are three kinds of authentication methodologies can be chosen in industry which is shown in Fig. 1. The single-factor authentication (SFA) refers to the practice of requesting just a user ID and a password from a user [4]. To enable user identification, individuals are typically assigned a unique user ID. Authentication occurs when a user presents their credentials, such as a password, that corresponds to their user ID. While this practice is widely used on the World Wide Web, the predominant issue with password-based single-factor authentication is that users may either lack the knowledge to create strong and memorable passwords or underestimate the significance of security. As the complexity of password policies grows, so does the user experience also get worse. Balancing user experience with security will be a management challenge. If this issue persists, the IT management department may relax their grip on password standards, resulting in passwords with lower complexity and shorter lengths, such as reduce from twelve characters to six characters. Using one of these passwords is about as secure as using nothing at all, or using a sticky note that was either in use or carelessly discarded, since cracking them takes just minutes. With that in mind, passwords need to be less predictable to computers. A password entropy test estimates how difficult a particular password would be to hack using guessing, brute force cracking, dictionary attacks, or other standard techniques. In recent times, enterprises have heightened their authentication measures by introducing additional factors of authentication. These factors may comprise a one-time code dispatched to a user via mobile device upon attempting to sign on, or a biometric signature, such as a facial or thumbprint scan of the user. The two-factor authentication (2FA) means that besides the factors mentioned in single-factor authentication, there will be one more factor involved in the authentication [5]. When three or more identity verification factors are utilized during the authentication process, it is referred to as multi-factor authentication (MFA). An example of MFA includes a combination of a user ID and password, a biometric signature, and a personal question that the user must answer [4, 6].

1.2 Blockchain

The Blockchain was created by a mystery person Nakamoto [7]. It is a decentralized system which can store message in the block [8]. Blockchain also can be understood as a unique decentralized database which have several features such as tamper-proof, traceability, anonymity and open-sourced.

Blockchain has three types. The public blockchain is the first to arise. Nakamoto uses it to support to create the Bitcoin in 2009 [9]. The underlying technology of various cryptocurrencies and Non-Fungible Tokens are relying on the public blockchain today [10, 11]. The private blockchain is the other type of blockchains. It is utilized in circumstances where the number of devices that join the blockchain network is restricted and only those devices are permitted to participate. Academics agree that the private blockchain is adequate for IoT device connectivity [12–15]. The third type of blockchains is the consortium blockchain. It hybridizes the characteristics of the public blockchain and the private blockchain [16]. A list of authorized nodes and the high transactions speed is the most distinctive features from the public blockchains. Unlike the private blockchain, however, the governance of the consortium blockchain, is managed by the consortium blockchain committee, rather than a single individual or organization.

1.3 Data Middle Platform

The Data Middle Platform (DMP) has been a hot topic in China ever since it was first introduced in 2016 [17, 18]. This has had far-reaching effects for online business and the Internet as a whole. Alibaba, the concept's originator and an early adopter, has spent the past 12 years studying the scalability and data-use potential of the Data Middle Platform. Alibaba's dedication to updating and rebuilding has resulted in a reconstructed middle platform that facilitates more exploration, scattered data analysis, integrated capabilities for the middle platform, and global data intelligence.

One of the main goals of the Data Middle Platform is to facilitate the re-use of existing data-related skills [19]. However, in the specific practice process, different organizations will build different data centers according to the actual situation [20–26]. In general, as shown in Fig. 2, the Data Middle Platform is described as an enterprise-level platform with full data capabilities, including data acquisition, data storage, data retrieval, data analysis, data modelling, data governance, data service provisioning, and data application development. The Data Middle Platform is a bottom-up, enterprise-wide answer to the data island problem that may fundamentally remove the technological barriers between the various stages of data creation, storage, analysis, service, and circulation.

1.4 Privacy Protection

1.4.1 Zero Knowledge Proof

Zero-knowledge proof is a cryptographic method that enables a prover to establish the validity of certain information to a verifier without disclosing any additional information [27, 28]. Proofs that need zero knowledge must adhere to three different tenets.

If the assertion is accurate, an honest prover will be able to persuade an honest verifier, which is defined as a verifier who correctly follows the protocol, that the statement is accurate.
The fact that the assertion is incorrect does not rule out the possibility that a probabilistic cheater may successfully persuade an honest verifier that the statement is true.
If the statement is true, the purpose of the prover is to prove to the verifier and make the verifier believe that he knows or has a certain message. During the process of proving, the verifier must not leak any content about the proved message to the prover. If the statement is false, the purpose of the prover is to make the verifier believe that he knows or has the message.

The zero-knowledge proof is not a proof in the mathematical sense since it contains a minuscule possibility of mistake, and the deceiver may fool the prover by making false claims. In other words, it is not a proof in the mathematical sense. In a nutshell, proofs that need zero prior information are probabilistic in nature, as opposed to deterministic. However, there are methods that can minimize the error to numbers that are not significant at all.

1.4.2 K-anonymity

One of the famous anonymity protection mode known as K-anonymity is used to prevent de-identified private data from being re-identified by linking public information [29]. When using the k-anonymity measure, one may be certain that a particular data set has at least as many anonymous items as there are data points in the set. With generalization, confinement, analysis and permutation, it is particularly effective in protecting limited data and concealing sensitive data. De-identification, on the other hand, ignores the variety of information available. When a single set of records is present, there is a limit that is vulnerable to homogeneity assaults. X. Zhang et al. [30] developed a top-down, two-phase technique to anonymizing massive data in the cloud. Large volumes of data may be broken down into smaller pieces using this method, which uses MapReduce processing to do primary anonymization before re-anonymizing using k-anomy. Data anonymization may be hindered since a large amount of data is anonymized twice. MapReduce-based k-anonymization was proposed by Mehta and Rao [31]. This algorithm's primary purpose is to eliminate the need for custom mapper and reducer programs while also making it easier to publish large amounts of data. Data is divided into smaller pieces than in earlier approaches, and Hadoop's sorting and shuffling are used to distribute and merge the data. Consequently, as compared to earlier algorithms, this one requires fewer iterations and takes less time. While maintaining the same level of privacy protection, the proposed approach does so with fewer iterations and a shorter total run time.

1.4.3 Homomorphic Encryption

Homomorphic encryption is an encryption technique that is distinct from conventional methods. According to homomorphic encryption, the ciphertext to be processed according to a specific algebraic operation method to obtain the encrypted result. And the result obtained by decrypting it is the same as the result of the same operation on the original data. That means the result of "processing the ciphertext directly" and "processing and encrypting the original data" is the same.

Without needing to decode the data beforehand, this technique can perform operations such as retrieval and comparison in encrypted datasets. The problem of secrecy when handing private data to a third party might be resolved in this way in its fundamentals. Privacy-preserving outsourced storage and computing relies heavily on homomorphic encryption. Because of this, it is possible to transport encrypted data to commercial cloud environments for processing. For firms that deal with large numbers of customers, homomorphic encryption may help ease the burden of protecting their customers' privacy. While predictions made using data analytics in education may be difficult to execute through a third-party service provider due to concerns about data protection, if the prediction can be done directly utilizing encrypted data, these challenges are removed.

1.5 Layout of the Paper

The remaining sections of the paper are structured as follows. The related works section comprises three main components. Firstly, the research team provides a summary of the challenges faced by the Data Middle Platform in the realm of security and privacy, which are centered around the data life-cycle. Secondly, drawing from literature published over the past three years, the research team presents an overview of the use of blockchain in authentication technology. Finally, the section concludes with a discussion on the application of blockchain in zero-knowledge proof. In methodology section, a hybrid-anonymity authentication algorithm is given. Based on the algorithm, the team unveiled a blockchain-based traceable hybrid-anonymity authentication in Data Middle Platform in this study. This method uses k-anonymity, homomorphic encryption and other technologies to achieve zero-knowledge proof, and solves the storage and permanent auditing of authentication records through blockchain technology, providing a new technical direction for the authentication of the user who is using the Data Middle Platform. The research team develops a prototype based on the methodology to determine if the system is viable. Analysis and debate have been completed. The group concluded by summarizing their findings and outlining next directions for investigation.

Three fundamental themes run through the related work section. The first subsection discusses the Data Middle Platform's security and privacy problems. The blockchain's contribution to authentication will then be summarized in the following subsection. The contribution contributed to Blockchain technology and Zero Knowledge Proof will also be mentioned in the last subsection.

2.1 Security and Privacy Challenges of Data Middle Platform

In fact, as the Data Middle Platform come out not so long, there is not much discussion on the Data Middle Platform security and privacy preservation. Researchers have only been able to summarise the challenges of Data Middle Platform security and privacy preservation from a few academic writing and technical blogs of Data Middle Platform implementers. Specifically, based on the data lifecycle, the challenges can be classified into five areas which are showing in Fig. 3 [32–34].

2.1.1 Data Collection

In data collection phrase of Data Middle Platform, there are three securities and privacy issues need to be confirmed. First one is data protection measures do not match sensitivity level. The second one is the weakness for unified access control mechanism. The third one is risk of operation / maintenance personnel dragging and bumping into warehouse.

Data protection measures do not match sensitivity level: Data classification is not only the basis of data governance, but also the premise of data security. China's relevant laws and regulations also require different security measures and means for data of different security levels. For example, if the data in the government big data platform is not classified, the security protection will be carried out according to the default sensitive level of government network data, resulting in the following problems: first, more human, material and financial resources need to be invested in the security protection work; second, the data whose sensitive level is lower than the default sensitive level in the unclassified data will be overprotected, resulting in security problems In addition, the data protected by the default level will also contain a large number of sensitive data higher than the default level, resulting in the risk of insufficient protection of highly sensitive data. For enterprises, there are the same problems.

The weakness for unified access control mechanism

Big data resource pool includes relational database, massively parallel data warehouse and distributed big data. Each kind of database has its own account and permission system. If we want to apply the global security policy, we need to set each kind of database separately, which will inevitably lead to the problem of doubling the workload. Each type of database has inconsistent support for the granularity and standard of access control. Suppose that field level granularity access control is needed at present, while some components can only control table level granularity access. Therefore, the difference in access control ability will inevitably lead to the difficulty of making global security policy. "Workload increased" and "global policy making difficulty" become the short board of "unified access control mechanism".

Risk of operation / maintenance personnel dragging and bumping into warehouse

Operation and maintenance personnel use the database account for operation and maintenance management, and the authority on the account may exceed the standard required by the actual operation and maintenance management. If the lack of access behavior controls management, operation and maintenance personnel can carry out operations unrelated to operation and maintenance tasks on the technical level. Driven by curiosity or some interests, the operation and maintenance personnel can completely perform the operation of dragging and bumping the database to obtain the data they are interested in, resulting in a large number of data leakages.

2.1.2 Data Storage

There is only one significant challenge to data storage stage. That is the authority out of control. Risk of authority out of control: It is well known that every database in the big data resource pool has a super administrator account. The super user is the default user of the database creation process, and can be considered as the "creator" in the database. They are like the root user of UNIX system or the administrator user of windows, and have supreme power. Of course, for the sake of security, the password of this super user is generally controlled by a few people, but the excessive concentration of power will also bring problems. When the super user has the highest power, it means that he can do anything he wants without leaving any trace. Therefore, there is also the risk of authority out of control in the big data resource pool. In addition, in the data storage link, there are also the short board of unified access control mechanism and the risk of operation maintenance personnel dragging and bumping the database, which will not be repeated here.

2.1.3 Data Processing

Data processing is a critical stage. The data needs to be pulled out and processed and therefore raises security and privacy issues. Specifically, weaknesses include four parts: malicious operations in processing, lack of automatic approval mechanisms in command, uncontrolled management processes, and risk of privacy breaches.

Malicious operation in processing

Data governance behavior is implemented by executing governance scripts, which contain various data source operation commands. If the operation commands are not effectively controlled, malicious commands may appear. Once the malicious command is executed, the data source will be damaged and lead to other unpredictable security problems.

Lack of command automatic approval mechanism

Although adopting the approval method can govern the risk of the script itself, the approval process should not only judge the risk of the script itself, but also judge whether the database table affected by the script is consistent with the application information, so as to ensure the correctness of the governance script business. Therefore, for different target objects, governance scripts need different approval strategies. If the manual approval method is adopted, before the approval is completed, if the execution of the script that has not been approved cannot be prevented, there is the risk of unauthorized execution. In addition, manual approval is highly dependent on personal experience, which may lead to misjudgment of scripts containing a large number of operation commands. Therefore, there is still a risk that high-risk scripts will be approved. On the other hand, the manual approval mechanism cannot guarantee enough timeliness, which is a short board for the overall efficiency of the data center.

Management process out of control

Data governance is accomplished by executing governance scripts. When the governance script is executed through the governance platform, the execution process and results will be controlled by the governance platform. In addition, the governance script can also be executed manually. The process and results of manual execution cannot be synchronized to the governance platform, and the governance platform cannot be controlled, resulting in the data governance process out of control. Executing the governance script will directly operate the original library. If the governance script out of control is executed, the data may be destroyed.

Risk of privacy leakage

Abnormal data governance behavior (such as illegal execution of data query scripts) can lead to privacy leakage. If there is no analysis and audit mean, when abnormal behavior occurs, it cannot timely alarm, and it cannot trace and obtain evidence after abnormal behavior occurs.

2.1.4 Data Exchange and Sharing

The stage of data interchange and sharing in the Data Middle Platform encounters similar security and privacy challenges as that of the processing stage. As such, data exchange and sharing entail several security and privacy risks. In particular, these vulnerabilities include the risk of data leakage, unauthorized access to the application server, untraceable data leakage, and illicit use of data.

Data leakage risk

The data demander will request batch data from the data center for analysis. When the data center provides batch data, because it does not understand the business analysis indicators, it will provide excessive data. For example, an app needs to analyze the monthly number of marriages in this year. The app will apply for all the married population information in this year, including name, address, contact information, etc. In fact, the statistics of monthly quantity only need accurate time information, and there is no need to provide sensitive information such as name, address and contact information. When the data center does not understand the analysis content, it will provide the complete population information of this year's marriage, which leads to the leakage of sensitive information.

Risk of illegal application server access

Data resource management platform provides data services to applications through interface. If the interface lacks the trusted authentication mechanism of application server, the legitimacy of application server cannot be distinguished, and there is the risk of illegal application server access. After the illegal application server access, it can directly access all the interfaces provided by the data resource platform, which has a great security risk.

Data leakage is not traceable

Sometimes the data service platform needs to provide batch data to the data demander, and the same data may also be provided to multiple data demanders at the same time. Suppose there is a data leakage event, and the leaked data is provided to multiple demanders by the data service platform, the leakage channel may be one or more demanders, or the data middle station itself, so it is difficult to accurately determine the leakage path, and there is a risk that the data leakage cannot be traced, so it is difficult to be responsible.

Risk of illegal use of data

Data publishing outside the normal process belongs to abnormal behavior. For example, the data requested by the data demander beyond its actual needs is an abnormal behavior, and the unauthorized use of published data is also an abnormal behavior. The above abnormal use of published data will lead to the risk of illegal use of data. If there is no abnormal behavior monitoring means, when abnormal behavior occurs, it can't timely alarm, and it can't trace and collect evidence after abnormal behavior occurs.

2.1.5 General Security and Privacy risk

Except those four stages, there are another three challenges in the Data Middle Platform.

Lack of data access analysis means

The optimization of security policy depends on the mastery of the current data security state. The more we know about the current data security status, the more support we can provide for the subsequent security policy optimization. Data generally goes through the process of collection, processing, storage, exchange and sharing, use and destruction. Without the linkage analysis of the process, it is difficult to form the overall situation of data flow, master the current data security status as a whole, and provide effective support for the subsequent data security decision-making.

Lack of security event exception warning mechanism

When a security incident occurs, timely alarm feedback is needed to enter the emergency response process. If there is no security event exception alarm mechanism, it is impossible to take timely measures when security events occur. The later a security incident is discovered, the greater the loss will be.

Risk of sensitive information leakage

After the script development of the collection platform and governance platform is completed, it is necessary to test the effectiveness and stability of the script. Developers and testers must have database operation permission to conduct script testing, which will involve sensitive information, so there is a risk of sensitive information leakage in this link. Other business platforms will also carry out customized development in the future. The process is similar to the collection platform and governance platform. It is possible that sensitive data may be leaked throughout the course of developing and testing. To summarize, the Data Middle Platform is vulnerable to a variety of security threats because of the data's life cycle.

2.2 The Related Work about Blockchain using in Authentication

Abbas's research group [35] is looking at how blockchain technology might be used to the Internet of Vehicles and Vehicle-to-Everything Networks (IoV-VANETs), with a particular emphasis on the authentication subsystem. A method to privacy protection for a credit inquiry system is presented by Yuan et al. [36] using the blockchain technology. Users of credit-investigation systems, credit-investigation agencies, and cloud service providers may all benefit from the secure exchange of credit-investigation data thanks to this method. There are still certain flaws in the system, such as frequent interactions or an inability to be linked together, despite the fact that numerous conditional authentication techniques that preserve users' anonymity have been presented to make communication secure. As a result of this, the study presents a unique hierarchical authentication method that is assisted by blockchain technology in order to overcome these existing problems in a comprehensive manner [37]. Because of the limited resources available, traditional security and privacy architectures cannot be applied to the realm of the internet of things (IoT). In order to solve this issue, a security mechanism [38] that is based on blockchain technology has been implemented. This mechanism permits secure permitted access to smart city resources. In blockchain application situations, the identity authentication approach resolves the seeming contradiction between the need for anonymity and the need for traceability [39]. The fundamental purpose of Lee's work [40] is to design a vaccination passport (VP) validation system that is based on a wide blockchain architecture and is intended for utilization in a simulated international scenario. [41] Propose a secure authentication approach for users that utilizes the advantages and characteristics provided by blockchain technology and smart contracts. Several assaults, such as linkability attacks, DoS attacks, and DDoS attacks, have been discovered to be possible against the 5G-AKA technique. Therefore, a safer authentication method is needed to address these security concerns. To address these security challenges and boost the overall dependability of the 5G network, Chow et al. [42] present the Secure Blockchain-based Authentication and Key Agreement for 5G Networks (5GSBA). [43] Bathalapalli et al. have developed a hardware-assisted blockchain system that addresses both device and data security concerns in smart healthcare applications. This innovative blockchain framework is specifically designed for smart healthcare and has demonstrated superior security and functionality capabilities compared to other available solutions. A detailed comparison study has shown that this framework offers lower transaction and compute overhead, making it a promising solution for secure and efficient smart healthcare applications [44].

2.3 The Related Work about Zero Knowledge Proof Theories using in Blockchain

In 2018, Baza et al. [45] propose a way for the subsystems of an autonomous vehicle to update their firmware using blockchain and smart contracts. In the trustless network, it is smart to use zero knowledge proof technology to send out update packages. Gunasinghe et al. [46] present a decentralized protocol for users' identity information to be exchanged in a way that protects their privacy. This protocol is designed to meet all of these needs without relying on a central party to handle the transactions. Yang and Li [47] examine the limitations of centralized digital identity management systems and propose of using zero-knowledge proof (ZKP) methods and smart contracts to improve blockchain technology's preexisting claim identification paradigm. This would make it possible to separate an identity from its attributes and prevent the exposure of attribute ownership. Table 1 shows important work that was done besides these studies.

Table 1

The Contribution in Blockchain and ZKP area
Authors	Year	Contribution
Ou et. al. [48]	2019	Super Nodes are utilized to guarantee the authenticity of data, while Zerocash and zero-knowledge Succinct Non-interactive Argument of Knowledge (zk-SNARKs) are ensuring data anonymity.
JhansiRani et. al. [49]	2020	The ZKP and the blockchain using in Financial Wallet system.
Li et. al. [50]	2020	A blockchain-based system for managing traffic needs a decentralized and location-aware architecture to deal with issues of data integrity and privacy.
Capraz et. al. [51]	2021	A use case for using Blockchain and ZKP to protect personal data.
Cao et. al. [52]	2021	A system for distributing keys that is not centralized, and it can work with the DDRM structure that is already in place.
Li et. al. [53]	2021	Perform benchmark tests on the Hyperledger platform.
Song et. al. [54]	2021	To improve the safety of Internet of Things access control, this architecture implements zero-knowledge proof and blockchain-based smart contracts.
Barros et. al. [55]	2022	If you need confirmation that you've been vaccinated, you might want to look into the concepts of self-sovereign identification, blockchain, and zero-knowledge proofs.
Zhang [56]	2022	A novel and robust form of identification can be achieved through the use of an innovative zero-knowledge proof (ZNP) system, which allows one party to persuade the other of the veracity of their claims without revealing any additional information.

The authentication model has two phases. The user registration phase is the first of the two phases that make up the authentication paradigm. The second phase is the one that verifies the user. Abbreviations are available for each phase in several different forms. Table 2 has been updated to include the definitions of these acronyms for the sake of clarity and ease of reading.

Table 2

The abbreviation of the authentication model
Abbreviations	Explanation
OD	The original data can be a username and password, or a fingerprint or other unique information.
CPD	Proof data customized to identify the customer, set by the customer himself
${Hash}_{OD,CPD}$	Hash value calculated from the original data and the customized proof data
${f}_{encryption}$	Encryption function
CT	Ciphertext
${f}_{decryption}$	Decryption function
${Hash}_{UD,UCPD}$	Hash value calculated from the user data and the user`s customer proof data
UD	The data which is input by the user. This data like user`s username, password, fingerprint or other unique information for authentication.
UCPD	The user`s customer proof data
${f}_{Hash}$	Hash function
Hash	Hash value returned by the hash function
R(x)	Function for generate random parameters
${CT}^{\prime }$	The authenticated transfer ciphertext which will be transferred to the blockchain platform
${f}_{decryption}\left({CT}^{\prime }\right)$	The function for decrypt the authenticated transfer ciphertext
${f}_{k}$	k-Anonymous function

3.1 Registration Phase:

The goal of the registration process is to successfully register the user. During the phase which is illustrated by Fig. 4, the user will be required to fill in the relevant information. This information may be a username and password, or other types of information that may be required upon request. In the model, we abstract this as the original data (abbr. OD), which is the information that the user enters for authentication purposes according to the system requirements. In addition to the raw information, the user is given customer proof data (abbr. CPD). This data can be assigned by the system or filled in by the customer himself. It can be dynamic or static. But no matter how it is provided to the user, the CPD will be used in subsequent validations.

The OD and the CPD are processed by the hash function to form a string of hash values (abbr. ${Hash}_{OD,CPD}$). It is encrypted by the cryptographic function (abbr. ${f}_{encryption}$) to a ciphertext (abbr. CT) which will be used to transfer from one device to the blockchain platform. The blockchain system is decrypted by the decryption function (abbr. ${f}_{decryption}$) to form the hash value (abbr. ${\text{H}\text{a}\text{s}\text{h}}_{T}$). If everything is right, the ${\text{H}\text{a}\text{s}\text{h}}_{T}$ is equal to the ${Hash}_{OD,CPD}$ .The ${\text{H}\text{a}\text{s}\text{h}}_{T}$ and Blockchain Address will be registered in the Distributed Hash Table (DHT). To describe in mathematical terms, the ${\text{H}\text{a}\text{s}\text{h}}_{T}$ generate process is presented in the set of equations 1:

$${f}_{Hash}\left(OD,CPD\right)={Hash}_{OD,CPD}$$

$${f}_{encryption}\left({Hash}_{OD,CPD}\right) = CT$$

$${f}_{decryption}\left(CT\right) = {\text{H}\text{a}\text{s}\text{h}}_{T}$$

3.2 Authentication Phase:

In the authentication phase, it is mainly used to authenticate whether the user is the right person or not. Zero-knowledge proof and k-anonymity techniques will be involved in this phase to ensure the accuracy of the authentication. The Fig. 5 will give a clear show about this process.

Let's assume that there is a user who is ready to authenticate. There is no knowledge of whether this user is the correct user or a malicious user. The Eq. 2 demonstrates the user will generate his Hash (abbr. ${Hash}_{UD,UCPD}$) by using the hash function (abbr. ${f}_{Hash}$) with the user data (abbr. UD) and the user`s customer proof data (abbr. UCPD).

$${f}_{Hash}\left(UD,UCPD\right)={Hash}_{UD,UCPD}$$

Then, the system will generate a number of random numbers using the random function R(x) and a set of hash fragments under the action of the k-anonymous function like Eq. 3.

$${f}_{k}({Hash}_{UD,UCPD},R({x}_{1},{x}_{2}\cdots {x}_{n}\left)\right) = \left[{\text{H}\text{a}\text{s}\text{h}}_{1}^{\prime },{\text{H}\text{a}\text{s}\text{h}}_{2}^{\prime }\cdots {\text{H}\text{a}\text{s}\text{h}}_{\text{n}}^{\prime }\right]$$

The Eq. 4 means that the $\left[{\text{H}\text{a}\text{s}\text{h}}_{1}^{\prime },{\text{H}\text{a}\text{s}\text{h}}_{2}^{\prime }\cdots {\text{H}\text{a}\text{s}\text{h}}_{\text{n}}^{\prime }\right]$ and the $R({x}_{1},{x}_{2}\cdots {x}_{n})$ will be encrypted by using ${f}_{encryption}$ to the authenticated transfer ciphertext (abbr. ${CT}^{\prime }$) and transferred to the blockchain platform.

$${f}_{encryption}(\left[{\text{H}\text{a}\text{s}\text{h}}_{1}^{\prime },{\text{H}\text{a}\text{s}\text{h}}_{2}^{\prime }\cdots {\text{H}\text{a}\text{s}\text{h}}_{\text{n}}^{\prime }\right],R({x}_{1},{x}_{2}\cdots {x}_{n}\left)\right) = {CT}^{\prime }$$

The same as the registration phrase, the Eq. 5 shows when the ${CT}^{\prime }$ transmitted to the blockchain platform, the decryption function will decrypt the ciphertext and restore them.

$${f}_{decryption}\left({CT}^{\prime }\right) = \left[{\text{H}\text{a}\text{s}\text{h}}_{1}^{\prime },{\text{H}\text{a}\text{s}\text{h}}_{2}^{\prime }\cdots {\text{H}\text{a}\text{s}\text{h}}_{\text{n}}^{\prime }\right],R({x}_{1},{x}_{2}\cdots {x}_{n})$$

Next, the set of Eq. 6 give a process of the authentication. If ${\text{H}\text{a}\text{s}\text{h}}_{m}^{` }$ is equal to the result which calculated by using ${f}_{k}({(Hash}_{T},R({x}_{m})$), the user is authenticated, and the contract will be executed. But if most ${\text{H}\text{a}\text{s}\text{h}}_{m}^{` }$ is not equal to the result, the system will determine that the authentication has failed. And the smart contract also will be executed to log the fail authentication.

$\text{i}\text{f} {\text{H}\text{a}\text{s}\text{h}}_{m}^{` }== {f}_{k}({(Hash}_{T},R({x}_{m})$ ) –> Authenticated (6)

$\text{e}\text{l}\text{s}\text{e} \left[{\text{H}\text{a}\text{s}\text{h}}_{1}^{\prime },{\text{H}\text{a}\text{s}\text{h}}_{2}^{\prime }\cdots {\text{H}\text{a}\text{s}\text{h}}_{\text{n}}^{\prime }\right]\ne {f}_{k}{(Hash}_{T},R({x}_{1},{x}_{2}\cdots {x}_{n})$ ) –> Unauthenticated

An experiment was devised on the basis of the algorithm described in the Methodology to determine whether or not the algorithm is even possible. In particular, it is split into two parts, that is the user registration phrase and the user authentication phrase.

4.1 User Registration

We suppose a blockchain-based smart contract for evidence depositing has been launched prior to a user completing the registration process. As shown in Fig. 6, here we deploy an empty deposition contract using the AuthCenter account already established from the previous experiment.

We then create an account named Chuqiao_Chen. This account is automatically created and the private key address is generated by calling the blockchain's API when the front-end of the application is registered. The Fig. 7 give a clear show that the user`s private key address has been created. Next, the newly formed smart contract will be updated to include the private key.

Users complete the original data like username, customer proof data and other data during front-end registration page, and after passing the hash algorithm and encryption, call the blockchain API to store the hash value in the deposited smart contract, as shown in Fig. 8.

4.2 User Authentication

During the user verification phrase, the program generates sliced data by encrypting and slicing the user`s original data and the user`s customer proof information. If the comparison is correct, the verification passes.

As one verification does not fully prove that the user is a legitimate user. Based on the theory of multiple verification in the zero-knowledge proof, we define the start and end numbers of the slice as random values and define the number of tests as three. That is, three random slices of different lengths with all three values verified as correct are considered to be the correct user. The Fig. 9 illustrates the results of our verification of the Chuqiao_Chen test account.

In the current industry, authentication is usually implemented by encrypting the user's password and transmitting it to the authentication server for comparison. The encryption algorithm is also relatively simple and is usually implemented using Base64 or MD5. The challenge with this approach is that if a malicious user captures the data and stores the packets, he or she can login as a regular user without knowing the password. In BTHAAT, however, by making several slices of the encrypted data and comparing the slices separately, it is possible that even if a malicious user intercepts the relevant data, he or she cannot guess the entire encrypted data, thus achieving privacy protection. This not only uses K-anonymity technology, but also uses the theory of zero-knowledge proof. Of course, the Secure Hash Algorithm (SHA) can be used for encryption during transmission.

Moreover, since the traditional authentication technology uses a structured query language database to store passwords, it is difficult to ensure that the password stored in the authentication server is the password set by the user if a malicious user cracks the authentication server and changes the password. By using blockchain technology, the security of passwords can be guaranteed. Moreover, by uploading the authentication status to the chain, the integrity and accuracy of the records can be ensured in the future audit process.

Beginning with the characteristics of the blockchain, the research team has penned a brand-new authentication algorithm. This algorithm combines methods of anonymity with homomorphic encryption and the idea of zero-knowledge proofs. Additionally, the algorithm is based on the concept of zero-knowledge proofs. User registration and user verification are the two fruitful components that make up the algorithm specifically. User registration and user verification are the two fruitful components that make up the algorithm specifically. The user must generate their own address on the blockchain platform and then load it into a smart contract that is deployed by a central node in order to complete the registration portion of the process. After that, the key certificate is encrypted, and it is deposited onto the blockchain so that it may be stored securely. The relevant key is located by using the user's address when the user is validated, and the privacy information is safeguarded via numerous slices of verification.

However, at the moment, validation of this approach is limited to being performed on very small batches of data. It has to be seen whether or not this data can be utilized by huge numbers of people and put to use in actual industry. This will require additional demonstration.

Funding

The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.

Competing Interests

The authors have no relevant financial or non-financial interests to disclose.

Author Contributions

“All authors contributed to the study conception and design. Material preparation, data collection performed by [Chuqiao Chen] and [S. B. Goyal] and analysis were performed by [Chuqiao Chen], [P Senthil] and [Anand Singh Rajawat]. The first draft of the manuscript was written by [Chuqiao Chen] and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.”

Availability of data and materials

Data will be made available from corresponding author based on request.

Code availability

Corresponding author will share the code on an appropriate request.

Jamwal, A., et al., Industry 4.0 Technologies for Manufacturing Sustainability: A Systematic Review and Future Research Directions. Applied Sciences, 2021. 11(12): p. 5725.
MANNER, I.I.A.S., INDUSTRY 4.0 CYBERSECURITY: CHALLENGES & RECOMMENDATIONS. 2019.
Wikipedia, Authentication, in Wikipedia. 2022.
Ibrokhimov, S., et al., Multi-Factor Authentication in Cyber Physical System: A State of Art Survey. 2019. p. 279-284.
AlQahtani, A.A.S., H. Alamleh, and J. Gourd, CI2FA: Continuous Indoor Two-factor Authentication Based On Trilateration System, in 2021 International Conference on COMmunication Systems & NETworkS (COMSNETS). 2021, IEEE: Bangalore, India. p. 1-5.
Grimes, R.A., Types of Authentication. 1 ed. 2021. 59-99.
Nakamoto, S., Bitcoin: A Peer-to-Peer Electronic Cash System. Bitcoin, 2008.
Khan, Y., S.B. Goyal, and P. Bedi. Security Challenges of Blockchain. in International Conference on Innovations in Bio-Inspired Computing and Applications (IBICA 2020). 2020. Springer International Publishing.
Barde, S., Blockchain and Cryptocurrencies. Emerging Computing Paradigms: Principles, Advances and Applications, 2022: p. 30.
Navamani, T.M., A Review on Cryptocurrencies Security. Journal of Applied Security Research, 2021: p. 1-21.
Aggarwal, S. and N. Kumar, Chapter Twelve - Cryptocurrencies☆, in Advances in Computers, S. Aggarwal, N. Kumar, and P. Raj, Editors. 2021, Elsevier. p. 227-266.
Assaqty, M.I.S., et al., Private-Blockchain-Based Industrial IoT for Material and Product Tracking in Smart Manufacturing. IEEE Network, 2020. 34(5): p. 91-97.
Bera, B., A.K. Das, and A.K. Sutrala, Private blockchain-based access control mechanism for unauthorized UAV detection and mitigation in Internet of Drones environment. Computer Communications, 2021. 166: p. 91-109.
Chen, X., et al., Decentralizing Private Blockchain-IoT Network with OLSR. Future Internet, 2021. 13(7): p. 168.
Bera, B., et al., Private blockchain-envisioned drones-assisted authentication scheme in IoT-enabled agricultural environment. Computer Standards & Interfaces, 2022. 80: p. 103567.
Dib, O., et al., Consortium blockchains: Overview, applications and challenges. International Journal On Advances in Telecommunications, 2018. 11(1&2): p. 51-64.
Tao, G. The Intelligent Evolution of the Data Middle Platform – 12 Years of Development from Alibaba's Data Platform. 2021 [cited 2022 16 August 2022]; Available from: https://www.alibabacloud.com/blog/the-intelligent-evolution-of-the-data-middle-platform-12-years-of-development-from-alibabas-data-platform_598097.
Chen, C. and S.B. Goyal, Data Security and Privacy-Preserving Framework Using Machine Learning and Blockchain in Big-Data to Data Middle Platform in the Era of IR 4.0, in Recent Trends in Intensive Computing. 2021, IOS Press. p. 145-152.
Zhang, C. and L. Hou, Data middle platform construction: The strategy and practice of National Bureau of Statistics of China. Statistical Journal of the IAOS, 2020. 36(4): p. 979-986.
Haotian, Z., L. Tao, and Y. Song, Design and Implementation of Data Middle Platform, in 2021 2nd International Conference on Artificial Intelligence and Information Systems. 2021, Association for Computing Machinery: Chongqing, China. p. Article 194.
Mao, Z., et al., Government data governance framework based on a data middle platform. Aslib Journal of Information Management, 2022. 74(2): p. 289-310.
Wu, P., M. Xu, and L. Cheng, An Improved CNN-Based Completion Method for Power Grid Middle Platform Data. Journal of Physics: Conference Series, 2021. 1815(1): p. 012034.
Ding, Z., J. Wang, and Y. Cheng, A Long Short-Term Memory Network-Based Intrusion Detection Method for Power Grid Middle Platform. Journal of Physics: Conference Series, 2021. 1815(1): p. 012007.
Qian, H., et al., Research on Construction and Key Technology of Water Conservancy Data Middle Platform. IOP Conference Series: Earth and Environmental Science, 2021. 768(1): p. 012112.
Wang, J., M. Xu, and S. Zhou, A Smoothness Regularized Low-Rank Completion Method for Power Grid Middle Platform Data. Journal of Physics: Conference Series, 2021. 1815(1): p. 012036.
Chen, C., S.B. Goyal, and K. Ramaswamy, BSPPF: Blockchain-Based Security and Privacy Preventing Framework for Data Middle Platform in the Era of IR 4.0. Journal of Nanomaterials, 2022. 2022: p. 2219006.
Goldreich, O. and Y. Oren, Definitions and properties of zero-knowledge proof systems. Journal of Cryptology, 1994. 7(1): p. 1-32.
William, A. and H. Johan, Statistical zero-knowledge languages can be recognized in two rounds. Journal of Computer and System Sciences, 1991. 42(3): p. 327-345.
SWEENEY, L., ACHIEVING k-ANONYMITY PRIVACY PROTECTION USING GENERALIZATION AND SUPPRESSION. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2002. 10(05): p. 571-588.
Zhang, X., et al., A Scalable Two-Phase Top-Down Specialization Approach for Data Anonymization Using MapReduce on Cloud. IEEE Transactions on Parallel and Distributed Systems, 2014. 25(2): p. 363-373.
Mehta, B.B. and U.P. Rao, Privacy preserving big data publishing: a scalable k-anonymization approach using MapReduce. IET Software, 2017. 11(5): p. 271-276.
Wang, C., Talking about the research and suggestions of the data middle platform architecture and security on the Internet enterprise. China CIO News, 2019(08): p. 65-66.
Xu, P. Strengthening the Security System of Data Cente. in The 36th China (Tianjin) 2022' IT, Network, Information Technology, Electronics, Instrumentation Innovation Academic Conference. 2022. 中国天津.
Zhang, J., J. Xu, and W. Xiao, Discussion on Construction of Data Middle Platform. Designing Techniques of Posts and Telecommunications, 2021(08): p. 74-79.
Abbas, S., et al., Blockchain-Based Authentication in Internet of Vehicles: A Survey. Sensors, 2021. 21(23): p. 7927.
Yuan, K., et al., Privacy-Protection Scheme of a Credit-Investigation System Based on Blockchain. Entropy, 2021. 23(12): p. 1657.
He, X., et al., A Hierarchical Blockchain-Assisted Conditional Privacy-Preserving Authentication Scheme for Vehicular Ad Hoc Networks. Sensors, 2022. 22(6): p. 2299.
Asif, M., et al., Blockchain-Based Authentication and Trust Management Mechanism for Smart Cities. Sensors, 2022. 22(7): p. 2604.
Wang, L., Internet of Things Device Identification Algorithm considering User Privacy. Computational Intelligence and Neuroscience, 2022. 2022: p. 1-8.
Lee, H.A., et al., Design of a Vaccine Passport Validation System Using Blockchain-based Architecture: Development Study. JMIR Public Health and Surveillance, 2022. 8(4): p. e32411.
Umoren, O., et al., Securing Fog Computing with a Decentralised User Authentication Approach Based on Blockchain. Sensors, 2022. 22(10): p. 3956.
Chow, M.C. and M. Ma, A Secure Blockchain-Based Authentication and Key Agreement Scheme for 3GPP 5G Networks. Sensors, 2022. 22(12): p. 4525.
Bathalapalli, V.K.V.V., et al., PUFchain 2.0: Hardware-Assisted Robust Blockchain for Sustainable Simultaneous Device and Data Security in Smart Healthcare. SN Computer Science, 2022. 3(5).
Gupta, M., et al., Game Theory-Based Authentication Framework to Secure Internet of Vehicles with Blockchain. Sensors, 2022. 22(14): p. 5119.
Baza, M., et al., Blockchain-based Firmware Update Scheme Tailored for Autonomous Vehicles. 2019. p. 1-7.
Gunasinghe, H., et al., PrivIdEx: Privacy Preserving and Secure Exchange of Digital Identity Assets, in The World Wide Web Conference. 2019, Association for Computing Machinery: San Francisco, CA, USA. p. 594–604.
Xiaohui, Y. and L. Wenjie, A zero-knowledge-proof-based digital identity management scheme in blockchain. Computers & Security, 2020. 99: p. 102050.
Ou, W., M. Deng, and E. Luo. A Decentralized and Anonymous Data Transaction Scheme Based on Blockchain and Zero-Knowledge Proof in Vehicle Networking (Workshop Paper). 2019. Cham: Springer International Publishing.
Rani P, J. and M. J, Authentication of Financial Wallet System and Data Protection using BlockChain. 2020. p. 1-5.
Li, W., et al., Privacy-Preserving Traffic Management: A Blockchain and Zero-Knowledge Proof Inspired Approach. IEEE Access, 2020. 8: p. 181733-181743.
Capraz, S. and A. Ozsoy, Personal Data Protection in Blockchain with Zero-Knowledge Proof, in Blockchain Technology and Innovations in Business Processes, S. Patnaik, et al., Editors. 2021, Springer Singapore: Singapore. p. 109-124.
Cao, Z. and L. Zhao, A Design of Key Distribution Mechanism in Decentralized Digital Rights Management Based on Blockchain and Zero-Knowledge Proof, in 2021 The 3rd International Conference on Blockchain Technology. 2021, Association for Computing Machinery: Shanghai, China. p. 53–59.
Li, W., et al., Location-aware Verification for Autonomous Truck Platooning Based on Blockchain and Zero-knowledge Proof. 2021. p. 1-5.
Song, L., et al., An access control model for the Internet of Things based on zero-knowledge token and blockchain. EURASIP Journal on Wireless Communications and Networking, 2021. 2021(1): p. 105.
de Vasconcelos Barros, M., F. Schardong, and R. Felipe Custódio, Leveraging Self-Sovereign Identity, Blockchain, and Zero-Knowledge Proof to Build a Privacy-Preserving Vaccination Pass. Blockchain, and Zero-Knowledge Proof to Build a Privacy-Preserving Vaccination Pass, 2022.
Zhang, Y., Increasing Cyber Defense in the Music Education Sector Using Blockchain Zero-Knowledge Proof Identification. Computational Intelligence and Neuroscience, 2022. 2022: p. 9922167.

Download PDF

Editorial decision: Reject after review
08 Jun, 2024
Reviewers agreed at journal
31 Jul, 2023
Reviewers invited by journal
26 Jul, 2023
Editor assigned by journal
28 Jun, 2023
First submitted to journal
27 Jun, 2023

You are reading this latest preprint version

BTHAAT: A Blockchain-based Traceable Hybrid-Anonymity Authentication Technique for Data Middle Platform in Industry 4.0

Status:

Version 1

Abstract

Figures

1. Introduction

1.1 Authentication

1.2 Blockchain

1.3 Data Middle Platform

1.4 Privacy Protection

1.4.1 Zero Knowledge Proof

1.4.2 K-anonymity

1.4.3 Homomorphic Encryption

1.5 Layout of the Paper

2. Related Works

2.1 Security and Privacy Challenges of Data Middle Platform

2.1.1 Data Collection

2.1.2 Data Storage

2.1.3 Data Processing

2.1.4 Data Exchange and Sharing

2.1.5 General Security and Privacy risk

2.2 The Related Work about Blockchain using in Authentication

2.3 The Related Work about Zero Knowledge Proof Theories using in Blockchain

3. Methodology

3.1 Registration Phase:

3.2 Authentication Phase:

4. Results

4.1 User Registration

4.2 User Authentication

5. Comparison and Discussion

6. Conclusion and Future Work

Declarations

References

Status:

Version 1

Abbreviations	Explanation
OD	The original data can be a username and password, or a fingerprint or other unique information.
CPD	Proof data customized to identify the customer, set by the customer himself
\({Hash}_{OD,CPD}\)	Hash value calculated from the original data and the customized proof data
\({f}_{encryption}\)	Encryption function
CT	Ciphertext
\({f}_{decryption}\)	Decryption function
\({Hash}_{UD,UCPD}\)	Hash value calculated from the user data and the user`s customer proof data
UD	The data which is input by the user. This data like user`s username, password, fingerprint or other unique information for authentication.
UCPD	The user`s customer proof data
\({f}_{Hash}\)	Hash function
Hash	Hash value returned by the hash function
R(x)	Function for generate random parameters
\({CT}^{\prime }\)	The authenticated transfer ciphertext which will be transferred to the blockchain platform
\({f}_{decryption}\left({CT}^{\prime }\right)\)	The function for decrypt the authenticated transfer ciphertext
\({f}_{k}\)	k-Anonymous function